Issue 25 : February, 2021 PDF Icon

NGT Program Highlights
Next-Generation Technologies for Next-Generation Cancer Models

The Next-generation Technologies for Next-generation Cancer Models Program supports the development of technologies and tools that will facilitate, accelerate, and enhance research using advanced human-derived next-generation cancer models such as organoids, conditionally reprogrammed cells, and others. 

CTD² Program Highlights
CTD² Dashboard: Content and Feature Updates

The article describes recent improvements of the CTD² Dashboard in presentation, annotation of experimental methods, the search function, and addition of new links to external data resources and web applications. The Dashboard aims to make cancer-relevant results from the CTD² Network easily discoverable.

HCMI Program Highlights
HCMI Program Updates and New Features of the Searchable Catalog

HCMI Searchable Catalog has been updated with new features for users to easily locate the models of interest and their case-associated data at the Genomic Data Commons. Currently, there are 148 models from 18 primary sites available with quality-controlled clinical, biospecimen and molecular characterization data from 63 cases.

CTD² Guest Editorial
CTD² PANcancer Analysis of Chemical Entity Activity (PANACEA) DREAM Challenge

To better understand the off-target programs of cancer therapies, an RNA-sequencing database called PANACEA measures the mRNA expression level of all 20,000 protein-coding genes that are turned on or off by 400 FDA-approved or late stage clinical cancer drugs.

NGT Program Highlights
Next-Generation Technologies for Next-Generation Cancer Models

Cindy Kyi, Ph.D. and Eva Tonsing-Carter, Ph.D
Office of Cancer Genomics
Banner for Next-Generation Technologies Program

The Next-generation Technologies for Next-generation Cancer Models Program (NGT; RFA-CA-19-055) supports the development of technologies and tools that will facilitate, accelerate, and enhance research using advanced human-derived next-generation cancer models (NGCMs) such as organoids, conditionally reprogrammed cells, and others. The tools developed under this program will focus on utilizing NGCMs generated under the auspices of Human Cancer Models Initiative (HCMI). Both the HCMI and NGT programs are associated with the Beau Biden Cancer MoonshotSM Initiative to accelerate cancer research.

Patient-derived HCMI NGCMs are improved upon traditional cell lines. The models encapsulate the cellular architecture of original tumors, are associated with genomic and clinical data from the originating tumors, and represent broad cancer types such as rare cancers, pediatric cancers and cancers from racial and ethnic minorities. Currently, there are 148 HCMI models from 18 different primary anatomic sites; the available models can be browsed on HCMI Searchable Catalog

The primary goal of the NGT program is to facilitate broad application of HCMI’s NGCMs by providing researchers robust and reproducible genome editing/manipulating protocols and reagents and to enable advanced data interpretation. The new tools and broader use of HCMI NGCMs will contribute to progress in understanding the important pathways in cancer initiation, progression and metastasis, identifying mechanisms of resistance, discovering novel therapeutic targets, developing diagnostic and/or predictive biomarkers, and other aspects relevant to precision oncology. Protocols, knowledge, and materials developed by this program will be shared broadly with the research community.

There are three Centers working towards developing next-generation technologies and tools under this program – Broad Institute, Dana Farber Cancer Institute, and Massachusetts Institute of Technology.

The Broad Institute Center, co-led by Todd R. Golub, M.D., and John G. Doench, Ph.D., plans to develop genome editing vector systems and high-throughput screening methods suitable for slowly proliferating HCMI NGCMs as well as characterization methods to predict their metastatic potential in vivo.

Slow-proliferating cells present challenges to standard genome editing approaches due to the experimental need for large quantities of starting cell material. Therefore, instead of creating NGCMs that stably express Cas9 and then introducing guide RNAs, Broad will develop all-in-one genome editing vector systems. Additionally, alternative readouts will be required for efficient screening of the HCMI NGCMs, and the Center will be using short-term single cell RNA sequencing (scRNA-seq) methods that will serve as surrogate readouts for long-term cell viability. The group will also utilize multiplexed Cas 12/gRNA gene editing system for genetic perturbation studies and develop methods to determine organ-specific in vivo metastatic potential for NGCMs. The Broad Institute plans to create a public resource of the metastasis potential map (MetMap) for at least 50 HCMI NGCMs.

The Dana Farber Cancer Institute (DFCI) Center, led by William C. Hahn, M.D., Ph.D., will develop genome scale informatics methods as well as high-throughput approaches to profile genetic characteristics of HCMI NGCMs and study their responses to small molecule or genetic perturbations and sensitivity to drugs. Using innovative MIX-Seq (Multiplexed Interrogation of gene eXpression through single-cell RNA Sequencing) and computational methods, the team will explore cell state plasticity and heterogeneity in these models. These studies will allow the cancer research community to perform both high- and low-throughput analyses in HCMI NGCMs and provide a deeper insight into the stability and phenotypes represented by these models.

The team will build on the preliminary studies that indicated that pancreatic NGCMs exhibit heterogeneity and cell state plasticity when compared to the originating tumor. Using newly developed sequencing technology, the DFCI Center will interrogate the dynamics of these state changes and assess the degree of heterogeneity in the NGCMs. Additionally, the group will build on Project Achilles and the DepMap to create and implement an optimized genome scale CRISPR-Cas9 library that permits the systematic genetic interrogation of genetic dependencies in NGCMs.

The Massachusetts Institute of Technology (MIT) Center, led by Timothy K. Lu, M.D., Ph.D., Ömer H. Yilmaz, M.D., Ph.D., and Bonnie Berger, Ph.D., plans to develop innovative experimental and computational tools combining synthetic biology, cancer organoid technology, and bioinformatics. The Synthetic Tools to Annotate Reporter Organoids for Cancer Heterogeneity and Recurrence Development (StarOrchard) toolbox will include Synthetic Promoter Activated Recombination of Kaleidoscopic Organoids (SPARKO), Combinatorial Genetics En Masse (CombiGEM), and single-cell RNA sequencing panorama (Scanorama).

The SPARKO tool allows annotation of heterogeneous cancer populations within living cells via fluorescent protein expression libraries to make multi-colored tumor organoids. CombiGEM can rapidly identify potential therapeutic targets via large-scale, massively parallel, and unbiased combinatorial genetic screens. CombiGEM will be used to assemble a library of barcoded genetic libraries of perturbations created using a pair-wise guide RNA-mediated CRISPR system to investigate novel synthetic lethality and identify therapeutic targets. Scanorama is an efficient tool that integrates large single-cell transcriptomics (scRNA-seq) datasets from diverse cell types and tumor types via sophisticated bioinformatics algorithms. Scanorama can, then, find nearest neighbors among other datasets and group similar cell types together in a panoramic fashion. The Center reports that the StarOrchard tools focus on barcoding strategies to enable accurate longitudinal tracking and analysis of individual tumor cells that harbor distinct genetic aberrations and substantially expand the utility of NGCMs.

All data and resources such as protocols and reagents from the NGT program will be made publicly available to maximize the translational impact of these findings. These studies will generate protocols that will enable systematic functional investigations using NGCMs and facilitate the discovery of novel biomarkers and therapeutic targets. These tools and technologies can be used by the research community to ask specific questions using HCMI NGCMs in the efforts toward precision oncology. Visit the NGT program page for more information and future updates on the program.

CTD² Program Highlights
CTD² Dashboard: Content and Feature Updates

Kenneth Smith, Ph.D., Vlado Dancik, Ph.D., Zhou Ji, Ph.D., Aristidis Floratos, Ph.D., and Paul Clemons, Ph.D.
Cancer Target Discovery and Development Logo

In a previous e-Newsletter (2016), we described the design and implementation of the Cancer Target Discovery and Development (CTD2) Dashboard1, which aims to make cancer-relevant results from the CTD2 Network easily discoverable by other researchers and the public. Dashboard users can directly browse or search Dashboard “subjects”, including genes, chemical compounds, cell lines, and diseases. Unique combinations of subjects positively connected by distinct types of evidence make up a Dashboard “observation”. For example, an observation may link a drug with a differentially expressed gene tested in the context of a particular cell line.

‘Search’ or ‘browse’ results are displayed in the form of “observation summaries”, which encapsulate a finding and provide links to supporting evidence. Results of interest about a subject can thus be immediately located and cross-referenced across projects and Centers. Strength of evidence is reported using the Three Evidence Tiers representing preliminary data, in vitro confirmation, and in vivo validation in relevant cancer models. All observations resulting from a particular experimental procedure in a publication, typically reporting its key results, are deposited into the Dashboard as a “submission”. A publication can give rise to multiple submissions, one for each set of findings.

Here we describe recent improvements in the capabilities of the Dashboard, focusing on a) presentation, b) annotation of experimental methods, c) the search function, and d) new links to external data resources and web applications.

Dashboard Landing Page
The Dashboard landing page (Figure 1) has been revamped to render all salient features easily accessible via buttons. The new landing page hosts a new “experimental evidence” button, which allows users to locate observations based on underlying experimental methods, and a new “Content Summary” table.

Updated Dashboard landing page, including content summary (purple oval) and new subject button for experimental evidence (red oval)
Figure 1. Updated Dashboard landing page, including content summary (purple oval) and new subject button for experimental evidence (red oval).

Dashboard Content Summary
A new Content Summary (Figure 2) supplies a quick overview of the numbers of both submissions and observations for each subject type, plus ‘evidence types’ and ‘stories’, organized by Tier. Most observations are Tier 1, with decreasing numbers for Tiers 2 and 3. As of December 31, 2020, there are 202 submissions and over 56,000 individual observations (experimental approaches such as compound screening can produce substantial number of observations). As more submissions and stories are added, these numbers will increase.

Dashboard content summary, accessible by a prominent button on the landing page
Figure 2. Dashboard content summary, accessible by a prominent button on the landing page.

To aid CTD2 Network members in creating new submissions, we implemented a sophisticated “Submission Builder” web application, which guides the submitter through each step of the process and allows the result to be previewed directly as it will appear in the Dashboard.

Structured Experimental Evidence
The “Experimental Evidence” page of the Dashboard provides a new table (Figure 3) listing Evidence and Conclusion Ontology (ECO) terms and the number of submissions and observations (by Tier) to which they have been assigned. Clicking the observation ‘count’ for a particular experimental method reveals a page displaying those observations. We chose ECO because it is an industry standard and one of the primary ontologies of evidence terms supported by the Monarch Initiative and registered in the OBO Foundry, both open-source initiatives in biomedical ontologies. ECO provides a hierarchy of terms that describe types of experimental evidence, having evidence types suitable to annotate Dashboard data, such as RNA-seq, ECO:0000295; pharmacological assay, ECO:0006053; cell-viability assay, ECO:0005004; and computationally derived evidence, ECO:0007672. Each submission may be assigned as many terms as needed, depending on the complexity of the experiment being annotated. At present, 63 such terms have been used in Dashboard submissions. All recent submissions are annotated with ECO codes, and earlier submissions have been retroactively annotated. ECO terms can be searched directly in the main search box, either by name or by code, and result pages for each term can be referenced with stable Dashboard URLs.

Summary of evidence terms, including counts of submissions by Tier linking to relevant observations
Figure 3. Summary of evidence terms, including counts of submissions by Tier linking to relevant observations.

Technical Improvements
In addition to the new features, we made major improvements to the existing Dashboard ‘search’ function for faster searching and browsing. Searches involving multiple words were improved to list the most relevant results first and to identify observations matching all the words in a query. These improvements make ‘search’ a powerful tool for quickly finding results involving multiple terms, for example involving both a particular compound and a particular gene. With ongoing code updates, the Dashboard is also kept in-line with the latest web-security standards. Currently (early 2021), a hierarchical search capability is being implemented. To do this, diseases and ECO terms will be represented as hierarchical ontologies, and this new function will allow the Dashboard to return more specific terms (children) of the matched term. The Dashboard has also been improved to use stable URLs for all results, so that the results can be shared, saved, or referenced.

Links to External Resources
The Dashboard content has been enriched through additional external links to authoritative information sources for subjects, such as compounds and cell lines. To simplify enabling and extending this improvement, a new framework to manage such links in the Dashboard was developed. Among the new resources supported in this fashion are DepMap (Cancer Dependency Map, for compounds), Cellosaurus (cell lines), and MalaCards (human diseases).

The Dashboard features a “Gene Cart”, which allows query of several gene and protein interaction databases and provides a graphical view of the result. Two new external resources have been recently added to support Gene Cart function – “EnrichR” for gene-set-enrichment analysis and “STRING” for protein-protein interactions networks. Users can now send a custom list of genes to either of these web applications for analysis and visualization of results.

A Dashboard RESTful API was also implemented to allow external applications to directly query and receive results from the Dashboard for their own purposes. We envision that the API can serve as a substrate for external app developers to use Dashboard data in their own applications, and (in principle), the Dashboard can serve as a conduit directing users to these apps. Presently, demonstration apps that leverage work done toward the NCATS Biomedical Data Translator are being developed.

Subject Data Improvements
As with all Dashboard subjects, each gene has its own subject page, which provides information and external links to other resources, as well as a list of observations in which the gene appears. Subject pages have been recently augmented with additional synonyms from NCBI, enhancing the ability to find genes via old or alternate names. The ability to support submissions from multiple Centers, or from institutions external to the CTD2 Network, has also been added.

Summary and Outlook
Our recent updates to the CTD2 Dashboard aim to make it more useful and accessible to end users. The presentation of the landing page is more concise, current and past submissions are annotated with structured evidence terms, and an expressive summary of Dashboard content has been added. Substantial technical improvements have been made to the ‘search’ function, and new links out to several external resources have been added. As a “living document”, the CTD2 Dashboard continues to grow in submission content over time, so users can gain new insights from the Network by performing even the same searches over time. For members of the CTD2 Network, the evolving Dashboard project represents not only the collective insights of the Network, but an increasingly rich opportunity to present experimental and computational evidence in a cross-linked, multimedia format.

Reference

  1. Aksoy BA, Dancík V, Smith K, et al. CTD2 Dashboard: a searchable web interface to connect validated results from the Cancer Target Discovery and Development Network. Database (Oxford). 2017 Jan 1;2017:bax054. PMID: 29220450

HCMI Program Highlights
HCMI Program Updates and New Features of the Searchable Catalog

Eva Tonsing-Carter, Ph.D. and Cindy Kyi, Ph.D.
Office of Cancer Genomics
HCMI organoid banner image

The Human Cancer Models Initiative (HCMI) is an international collaboration between the National Cancer Institute (NCI), Cancer Research UK (CRUK), Wellcome Sanger Institute (WSI), and foundation Hubrecht Organoid Technology (HUB). The goal of the HCMI is to create up to 1,000 next-generation cancer models from patient tumors that are annotated with clinical, biospecimen, and molecularly characterization data.

The HCMI models and their case-associated data are available to researchers as a community resource. Existing models may be queried at the HCMI Searchable Catalog, a continuously updated resource of available HCMI models. Since our last e-News article in August 2020, new models have been generated; bringing the total number of currently available models to 148 models from 18 different primary sites including brain, skin, colon, pancreas, lung, extrahepatic bile duct, and more. Of the 148 models, there are currently 10 models from five cases, which have multiple models derived from unique anatomic sites from the same patient. These include models derived from primary and metastatic or multiple metastatic tumors.

Quality-controlled and harmonized clinical, biospecimen, and molecular characterization data for 63 models from 62 cases are accessible as of February 16, 2021 at NCI’s Genomic Data Commons (GDC) Data Portal. Excitingly, one of these cases has multiple models derived from primary and metastatic ampulla of Vater cancer. The molecular characterization data includes RNA-sequencing, whole genome sequencing, whole exome sequencing, copy number variation, and masked somatic MAF data. Epigenetic data from a subset of models and their originating tumors will be available in the future. Data from new cases are being released as they become available. To access the data, visit the GDC’s Data Portal. dbGaP approval is required for accessing controlled-access data. The clinical, biospecimen, and masked somatic MAF data are open-access and do not require approval to access. Visit the Accessing HCMI data webpage for more detailed information on how to access the HCMI data.

The HCMI Searchable Catalog has been updated with new interactive features for enhanced user experience in sorting and browsing the available HCMI models. New cancer types, new search features, multiple models from some cases, and open-access masked somatic MAF data for a subset of models have been added to the Catalog.

A summary of currently available models by cancer type, model type, and tissue type is described below (Figure 1).

Tissue type availableModel typesCancer Types

Figure 1. Pie charts of available HCMI models by cancer type, model type, and tissue type

Landing page

Landing page of the Searchable Catalog

Figure 2. A snapshot of the Searchable Catalog landing page

The Searchable Catalog landing page contains a main viewing table with a list of all available models within the Catalog (Figure 2). The Catalog has been streamlined with user interface enhancements including updated color scheme, increased text contrast, and helpful links in the footer. Available open-access masked somatic MAF data generated at the GDC have been integrated into the Catalog as a searchable element. Users can also search by gene, specific somatic variants, and type of research somatic variant (e.g. missense, nonsense, etc.). As the model data elements are selected, the results listed within the main viewing table (shown within the red border in Figure 2) will update dynamically. There are a few HCMI cases where multiple models were derived from unique sites from the same patient, and “Has Multiple Models” has been added as a search option to identify these cases.

The top panel on the landing page shows interactive circular graphs which users may use to filter the models by primary site, availability of multiple models, 2D versus 3D growth, and most frequently mutated genes by clicking on various colors on the graphs. Hovering over different colors within a graph will reveal relevant information.

Within the main viewing table, columns indicating the numbers of mutated genes, research somatic variants from the open-access masked somatic MAF data, clinical sequencing variants, and histopathological biomarkers have been added. Users can customize the columns shown by selecting data of interest from the “COLUMNS” dropdown menu. Users can sort the data displayed within the table in ascending or descending order by clicking on the headers of their choice within the main viewing table. The selected data can be exported using the “EXPORT ALL” function, saved as a .tsv file, and opened in Excel or similar program.

Users may conduct multiple searches and save models of interest by selecting the checkbox next to the model name. The selected models are added to “My Models List” (Icon for "my model list") and their associated data can be downloaded as a .tsv file for later usage.

Navigating individual model pages

To view details of individual models, users may click on a model name of interest, and they will be routed to the model page (Figure 3).

Snapshot of a model page

Figure 3. A Snapshot of an individual model page on the Searchable Catalog

The user interface of the individual model pages have been rearranged to allow users to quickly identify all the available information for any model. Data on individual model pages include model details such as the type, split ratio, tissue status, etc. and links to multiple models generated from the same patient (if available). The molecular characterization data types that are accessible at the GDC, patient details, model images in culture, and links to external resources such as the model distributor and the GDC are also displayed.

At the bottom of the page, the clinical sequencing and histopathological biomarkers provided from the medical records are listed (when available). The available open-access masked somatic MAF data generated at the GDC are integrated into the HCMI Searchable Catalog. Users can search, sort, and download the variant data by clicking the download .tsv icon (tsv file icon). The open-access masked somatic MAFs are highly processed to remove lower quality and potential germline variants. If omission of true-positive somatic mutations is a concern, we recommend accessing controlled-access MAFs, which requires user certification through dbGaP; visit Accessing HCMI Data for more information.

Users are encouraged to visit the Catalog frequently as new models and their associated data are being added to the HCMI Searchable Catalog as they become available. Users who have questions about HCMI, the models or the data are encouraged to visit the HCMI FAQ page or visit the HCMI program. A user guide is available to help users navigate the Catalog.

The HCMI models together with their case-associated data provide a robust resource for the research community. The models can be used in a variety of research endeavors to support precision oncology from investigating pathways that influence tumor initiation and progression to studying drug resistance and drug response.

CTD² Guest Editorial
CTD² PANcancer Analysis of Chemical Entity Activity (PANACEA) DREAM Challenge

Eugene F. Douglass Jr., Ph.D., Bence Szalai, Ph.D., Robert J. Allaway, Ph.D., and Andrea Califano, Ph.D.
Eugene F. Douglass Jr., Ph.D., Bence Szalai, Ph.D., Robert J. Allaway, Ph.D., and Andrea Califano, Ph.D.

During tissue-repair or wound-healing, growth factors are released to signal the cells to turn on “master switches” like the transcription factor MYC for cellular proliferation. These “master switches” (or “master regulators”) turn on hundreds of genes which serve as the raw materials to build copies of the cell. In cancer, DNA mutations often cause these growth circuits to become stuck on the “on” position, causing uncontrolled growth of the cancer cell (Figure 1A).

Targeted therapies are types of drugs that block specific components of growth circuits, forcing them to return to an off state that halts cellular proliferation and tumor growth. Unfortunately, targeted therapies can sometimes have other effects on the cell (off-targets) which can turn on other programs associated with drug side-effects and toxicity (Figure 1B). Identifying the off-targets of drugs is experimentally challenging, as it requires measuring the drug-target interactions on the proteome scale. As off-targets can drive adverse effects as well as therapeutic effects, identification of the whole target spectrum of targeted therapies is very important.

Master switch transcription factor signaling pathways drive cellular proliferation and are important therapeutic targets
Figure 1. Master switch transcription factor signaling pathways drive cellular proliferation and are important therapeutic targets.
A) Signaling via growth factors and growth factor receptors drives activation of master switches such as myc,
which in turn activates a broad swath of proliferative and other signaling pathways which are detectable with RNA-Seq.
B) Drugs that inhibit growth factor receptors attenuates activation of downstream master switches, preventing cellular proliferation.
However, these drugs often have other targets. This results in the perturbation of other signaling pathways,
which in turn results in therapeutic side effects.

To better understand the off-target programs of cancer therapies, we built an RNA-sequencing (RNA-Seq) database called PANACEA (PANcancer Analysis of Chemical Entity Activity), which measures the mRNA expression level of all 20,000 protein-coding genes that are turned on or off by 400 FDA-approved or late stage clinical cancer drugs. These RNA-Seq datasets can provide hints about the changes in activities of key signaling pathways of the cells, including those associated with tumor growth (Figure 1A) and off-target programs (Figure 1B). As PANACEA includes data from several cancer cell lines, it makes possible to observe the tissue of origin-based differences in these transcription programs. Beside RNA-Seq, the cell-death inducing abilities of the investigated drugs were also measured, at different concentrations, to identify the sensitive and resistant cell lines for each drug. This large-scale dataset gives excellent possibility to understand the key mechanisms of drug sensitivity, resistance, and off-target toxicity. Because of the large number of measurements (400 drugs x 21 cell lines x 20,000 genes), this database is suited to exploration using data science and machine learning tools (manuscript in preparation).

With this in mind, we hosted a community challenge with the DREAM Challenges initiative to identify the best machine learning methods to identify off-targets from 30 of the most common clinically-used targeted therapies. We provided the RNA-Seq data and drug sensitivity data from 11 cell lines for the 30 test molecules, from which the participants had to predict the targets (both on- and off-targets) for each molecule. The participants used public databases of RNA-Seq, drug sensitivity measurements, and drug targets to build their computational models. The name and chemical structure of the 30 test molecules were unknown to the participants to ensure that the developed methods were unbiased. Over two months, 21 teams contributed 86 models, of which 39 (45%) showed statistically significant enrichment of true off-targets within druggable proteins. The best performing methods leveraged a variety of approaches including fundamental chemical and transcriptomic similarity approaches and more sophisticated deep-learning methodologies. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacological effects in different tumor contexts, and insights into network-based assessment of context-specific drug mechanism of action. More information about the results of the challenge can be found at the Challenge landing page and in our recent preprint on bioRxiv1.

Reference

  1. Douglass Jr. E, Allaway RJ, Szalai B, et al. A Community Challenge for Pancancer Drug Mechanism of Action Inference from Perturbational Profile Data. bioRxiv. 2020 December 23. doi:10.1101/2020.12.21.423514