Read Cloud Technologies
Today, the most cost effective technologies for sequencing a genome (e.g., Illumina and Complete Genomics) have a significant limitation: the sequence reads are short. For instance, reads produced by HiSeq today are up to 150bp in length. As a result, the following are challenges of short reads:
We develop algorithms for many of the above problems. We collaborate extensively with 10X to apply the latest technology to projects in cancer genomics, metagenomics, and human genome variation identification. One of our latest algorithms, RFA, employs a Markov random field (MRF) to model the process of sequence preparation with read clouds, and map reads to the entire genome including dark regions. As a result, variant calling on the previously dark 6% of the genome is now greatly facilitated.
Fun reading: my favorite "unscored" NIH proposal. I thank Annelise Barron for her input.