Modeling intra-tumoral heterogeneity and trajectories of cancer evolution.

Cancer sequencing data now provides an opportunity to understand how cancers begin and progress at unprecedented resolution. Computational modeling is important for translating sequencing data into new assays for early detection, prognosis, and personalized response to treatment. We work closely with groups in Hopkins Oncology and Pathology who are using cutting-edge experimental techniques to microdissect cancer and pre-cancer lesions and sequence their DNA, including at single-cell resolution. We have new software tools in development to model heterogeneity and clonal architecture based on sequencing multiple dissected regions from a single lesion, single-cell DNA sequencing, and sequencing of multiple lesions per patient from biopsies and resections. Recently our SCHISM (SubClonal Hierarchy Inference from Somatic Mutations) software package was used to model the evolution of high grade serous ovarian cancers from fallopian tube lesions, ovarian cancers and metastases from nine patients, supporting the controversial hypothesis that these cancers originated in the patients' fallopian tubes.

Selected Publications:  Gut 2018;  Nat. Communications 2017;  PLoS Compbio 2015;  


The landscape of mutation-associated neoantigens and T-cell receptors in human cancers.

Cancer immunotherapies are yielding remarkable results for many patients, even those with advanced cancers, but not all patients respond to FDA-approved drugs, such as anti-PD1, anti-PDL1, anti-CTLA-4 checkpoint inhibitors. We are developing new computational methods in collaboration with researchers at the Johns Hopkins Medical School Upper Aerodigestive Disease Cancer Biology Center and the Bloomberg-Kimmel Institute for Cancer Immunotherapy to discover biomarkers of response, with a focus on genomic biomarkers from exome-sequencing and T-cell receptor sequencing.

Selected Publications:  Cancer Immunology Research 2019; Cancer Discovery 2016  


Computational analysis of genomic data to aid medical decision making.

In the new "post-genome" era of personalized medicine, many variants critical to disease susceptibilities and drug sensitivies will be identified and increased numbers of people will undergo genetic testing. We are developing algorithms and tools intended to facilitate this process.

Selected Publications:  American Journal of Human Genetics 2018; PLoS Compbio 2016;   Gastroenterology 2015; Human Mol. Genetics 2014; BMC Genomics 2013; Human Mutation 2012 ; Methods Mol. Biol 2011;   Cancer Informatics 2008;    PLoS Compbio 2007   


Identifying functionally important variation in cancer genomes

The genomes of tumors acquire somatic mutations that may provide insights into their mechanisms of action and potential cancer treatments. A key challenge is identifying biologically important sequence variation in these genomes.

OpenCRAVAT Custom Ranked Analysis of VAriants Toolkit with modular architecture and a wide variety of tools developed both by the CRAVAT team and the broader variant analysis community

CRAVAT, Web tool and services for high-throughput scoring and annotation of cancer mutations.

20/20+ Machine learning method to identify drivers and to distinguish tumor suppressor genes and oncogenes, based on large cohort studies of primary human cancers.

CHASM. A machine learning method that predicts missense mutations likely to drive tumor growth and progression. Read a JHU magazine article about CHASM.

MOCA. A model-free approach to find patterns of coordinated alterations in cancer genomics data sets. See this TCGA research highlight.

Selected Publications:  Cell 2018; Cancer Research 2017; PNAS USA 2016; Annals of Oncology 2015;  Bioinformatics 2013;   Leukemia 2011;  Nature 2011;  Cancer Research 2011;  Science 2011;  Cancer Biology Therapy 2010;  Cancer Research 2009


Protein evolution and antibiotic resistance

Understanding how novel functions evolve (genetic adaptation) is a critical goal of evolutionary biology. Among asexual organisms, genetic adaptation involves multiple mutations that frequently interact in a non-linear fashion (epistasis). Non-linear interactions pose a formatted challenge for computational prediction of mutation effects. We are exploring methods to predict epistatic effects and their impact on fitness, using the recent evolution of β-lactamase under antibiotic selection as a model for genetic adaptation.

We have developed the NAPA software package for network modeling of adaptive protein evolution.

Publications:  Mol. Bio Evol. 2018; PLoS Compbio 2011


Leveraging 3D protein structure for variant interpretation

The ability to visualize where variants and mutations occur within the tertiary struture of a protein can be useful in identifying biologically important mutations in an intuitive way, accessible to biologists.

HotMAPS, A statistically rigorous approach to identifying 3D regions enriched for somatic mutations in cancer.

MUPIT Interactive. On-demand mapping of any non-synonymous mutation onto an experimentally-derived structure in the Protein Data Bank

Selected Publications:  Cancer Research 2016   Human Genetics 2013  Nature 2011  Bioinformatics 2009  PLoS Compbio 2007