CBCB Doctoral Student Takes Genomics Research to New Heights
Whether it’s scrambling up sheer rock faces, trying to solve a complex problem in the lab, or working long hours to submit an academic paper, Jason Fan is always eager to take on the next challenge.
Fan is a fifth-year computer science doctoral student in the Center for Bioinformatics and Computational Biology (CBCB) focused on designing small, fast, and efficient algorithms and tools for analyzing DNA and RNA sequencing data.
He’s also an avid rock climber, having participated in collegiate sport and speed competitions as an undergraduate at Tufts University, and nowadays—when time and weather permit—visiting the New River Gorge National Park and Preserve in West Virginia.
Fan says his academic life and passion for rock climbing go hand in hand.
“I fell in love with the problem solving and meditative aspects of rock climbing,” he says. “When I'm climbing, my mind clears up to solely focus on trying to find the most efficient way up a wall. Rock climbing is also a lovely way to spend time outdoors and it helps me refocus, especially when I’m facing a tough problem in the lab.”
Alongside his adviser Rob Patro, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies, Fan is working to build fast and modular algorithms for indexing huge collections of reference genomes.
Fan’s current goal is to design small data-structures—or indexes—to support fast sequence queries against tens of human scale genomes, or tens of thousands of bacterial genomes.
“Instead of requiring a large server or computer, my vision is to design tools that are so efficient they enable the same analysis on, say, powerful laptops,” Fan says.
He is also assisting on a $350K research grant recently awarded to Patro by the Chan Zuckerberg Initiative (CZI), a philanthropic research organization launched in 2015 by Meta founder Mark Zuckerberg and his wife, Priscilla Chan.
Patro, who leads the Combine Lab, is using the CZI funding to improve upon a “constellation” of interrelated tools his lab has developed to process genomic data. This includes alevin-fry, a toolkit for the efficient processing of single-cell and single-nucleus RNA sequencing data, and Salmon, a tool for efficient and accurate transcript quantification from bulk RNA sequencing data.
Fan is contributing by focusing on the development of Pufferfish2, which is set to improve the core index that powers Salmon and alevin-fry, as well as other sequencing tools that are currently in-the-works.
Fan, who grew up in Hong Kong, says he enjoys the interdisciplinary nature of his work the most. He received a prestigious National Science Foundation fellowship in 2020, which helps support his research.
“In the Combine Lab, we have to be good theorists and engineers to design and program efficient tools that work well,” Fan says. “Simultaneously, we must be able to exploit our understanding of biology to apply clever and effective heuristics (mental shortcuts) to make our tools work even better in practice. Almost always, these insights and exploits deriving from biological phenomena are what make our tools accurate and efficient.”
He says it is exciting to try to understand and explain why some complex computational and mathematical insights matter so much for biological analyses.
“It’s very cool and exhilarating to know that our work and the tools we build help other researchers investigate their own interesting biological questions,” Fan says.
—Story by Melissa Brachfeld, UMIACS communications group