Serafim's Lab
  • Research
  • People
  • Resources
  • Publications
  • Alumni
  • Contact
  • Best Non Gamstop Casinos
  • Non Gamstop Casinos
  • Non Gamstop Casinos UK 2025
  • Non Gamstop Casinos UK
  • New Betting Sites Uk 2025
  • Research
  • People
  • Resources
  • Publications
  • Alumni
  • Contact

Read Cloud Technologies

Today, the most cost effective technologies for sequencing a genome (e.g., Illumina and Complete Genomics) have a significant limitation: the sequence reads are short. For instance, reads produced by HiSeq today are up to 150bp in length. As a result, the following are challenges of short reads:
  1. Phasing. Each read typically contains 0 or 1 germline human genome variants, so variants cannot be phased into the maternal and paternal chromosomes.
  2. Dark Regions. Repeats within the genome that are much longer than the read length often cannot be mapped, and as a result around 6% of the human genome is "dark".
  3. Structural Variants. Complex regions such as the important for immunity HLA region cannot be assembled.
  4. Metagenome Assembly and Phasing. When analyzing microbial community (metagenome) samples, the individual species and subspecies cannot be assembled.
  5. De Novo Assembly. Assembly of new, previously un-sequenced species is challenging.
An exciting recent array of sequencing preparation technologies is changing this. Starting from Moleculo (a Stanford-developed technology to which Serafim contributed) which is now part of Illumina, and continuing today with the substantially more powerful 10X Gemcode, it is possible to partition a genome into many long fragments (150,000bp), compartmentalize small groups of such fragments and give a unique barcode to each group, amplify, and finally get reads that are barcoded through a short read technology. The resulting reads form read clouds, or groups of reads sharing a barcode and all coming from the same source long fragment. The resulting information is very exciting; if leveraged properly with algorithms, it can help solve virtually all problems associated with short reads: (1) phasing is possible and even easy; (2) variants can be called in dark regions and notably in recent segmental duplications; (3) SV calling is facilitated; (4) metagenome assembly and phasing is in principle possible; (5) de novo assembly is much easier and leads to more contiguous and accurate scaffolds.

We develop algorithms for many of the above problems. We collaborate extensively with 10X to apply the latest technology to projects in cancer genomics, metagenomics, and human genome variation identification. One of our latest algorithms, RFA, employs a Markov random field (MRF) to model the process of sequence preparation with read clouds, and map reads to the entire genome including dark regions. As a result, variant calling on the previously dark 6% of the genome is now greatly facilitated.


Fun reading: my favorite "unscored" NIH proposal. I thank Annelise Barron for her input.
Picture
Picture
Picture

RESEARCH

PEOPLE

RESOURCES

PUBLICATIONS

CONTACT

Copyright © 2015 Serafim Batzoglou