Research

Rapid Targeting of Nanopore Reads Based on Pan-Genomic Databases

The Read Until API allows users to essentially perform targeted sequencing entirely in software when using Nanopore sequencers. State-of-the-art methods such as UNCALLED and Readfish allow users to target sequence from pre-specified references, however they are not optimized to work with large, repetitive references. SPUMONI is a tool that uses matching statistics or pseudo-matching lengths (a related quantity) to rapidly classify whether a read has a good “approximate” match to a large, repetitive database. Our experiments show that SPUMONI is faster, and more memory efficient than minimap2 when indexing pan-genomic references.

Efficient Quantification of Coverage in BigWigs, BAMs, and CRAMs

When dealing with high-throughput sequencing data, many genomic analyses typically start by looking at the data in certain key regions of the genomes such as genes or regulatory elements. Megadepth is a tool that was developed by Christopher Wilks. I assisted in the benchmarking of megadepth when comparing against other state-of-the-art tools. Our experiments show that megadepth was faster and more memory efficient that other tools across various types of data analysis.