Several new cancer bioinformatics tools published in scientific journals are important resources that can be helpful to University of Illinois Cancer Center members through the Cancer Bioinformatics Shared Resource (CBSR). CBSR Program Director Xiaowei Wang, PhD, was corresponding author on the publications, which included members of his lab.
Learn more about the new bioinformatics tools in excerpts from the publications below or email [email protected].
The CBSR is among six institutional shared resources available to Cancer Center members. The major goal of CBSR is to facilitate the development of strong interdisciplinary cancer programs by providing necessary bioinformatics resources to Cancer Center members. The CBSR strives to bridge the gap between clinicians and basic researchers for collaborations on translational cancer research.
Pan-Cancer Discovery of Somatic Mutations from RNA Sequencing Data (Communications Biology)
“Identification of somatic mutations (SMs) is essential for characterizing cancer genomes. While DNA-seq is the prevalent method for identifying SMs, RNA-seq provides an alternative strategy to discover tumor mutations in the transcribed genome. Here, we have developed a machine learning based pipeline to discover SMs based on RNA-seq data (designated as RNA-SMs). Subsequently, we have conducted a pan-cancer analysis to systematically identify RNA-SMs from over 8,000 tumors in The Cancer Genome Atlas (TCGA). In this way, we have identified over 105,000 novel SMs that had not been reported in previous TCGA studies. These novel SMs have significant clinical implications in designing targeted therapy for improved patient outcomes. Further, we have combined the SMs identified by both RNA-seq and DNA-seq analyses to depict an updated mutational landscape across 32 cancer types.”
SpatialDeX is a Reference-Free Method for Cell Type Deconvolution of Spatial Transcriptomics Data in Solid Tumors (Cancer Research)
“The rapid development of spatial transcriptomics (ST) technologies has enabled transcriptome-wide profiling of gene expression in tissue sections. Despite the emergence of single-cell resolution platforms, most ST sequencing studies still operate at a multi-cell resolution. Consequently, deconvolution of cell identities within the spatial spots has become imperative for characterizing cell type-specific spatial organization. To this end, we developed SpatialDeX, a regression model-based method for estimating cell type proportions in tumor ST spots. SpatialDeX exhibited comparable performance to reference-based methods and outperformed other reference-free methods with simulated ST data. Using experimental ST data, SpatialDeX demonstrated superior performance compared with both reference-based and reference-free approaches. Additionally, a pan-cancer clustering analysis on tumor spots identified by SpatialDeX unveiled distinct tumor progression mechanisms both within and across diverse cancer types. Overall, SpatialDeX is a valuable tool for unraveling the spatial cellular organization of tissues from ST data without requiring scRNA-seq references.”
OncoDB: An Interactive Online Database for Analysis of Gene Expression and Viral Infection in Cancer (Nucleic Acids Research)
“Large-scale multi-omics datasets, most prominently from the TCGA consortium, have been made available to the public for systematic characterization of human cancers. However, to date, there is a lack of corresponding online resources to utilize these valuable data to study gene expression dysregulation and viral infection, two major causes for cancer development and progression. To address these unmet needs, we established OncoDB, an online database resource to explore abnormal patterns in gene expression as well as viral infection that are correlated to clinical features in cancer. Specifically, OncoDB integrated RNA-seq, DNA methylation, and related clinical data from over 10 000 cancer patients in the TCGA study as well as from normal tissues in the GTEx study. Another unique aspect of OncoDB is its focus on oncoviruses. By mining TCGA RNA-seq data, we have identified six major oncoviruses across cancer types and further correlated viral infection to changes in host gene expression and clinical outcomes. All the analysis results are integratively presented in OncoDB with a flexible web interface to search for data related to RNA expression, DNA methylation, viral infection, and clinical features of the cancer patients. OncoDB is freely accessible at http://oncodb.org.”
Evaluation of Efficiency Prediction Algorithms and Development of Ensemble Model for CRISPR/Cas9 gRNA Selection (Bioinformatics)
“The CRISPR/Cas9 system is widely used for genome editing. The editing efficiency of CRISPR/Cas9 is mainly determined by the guide RNA (gRNA). Although many computational algorithms have been developed in recent years, it is still a challenge to select optimal bioinformatics tools for gRNA design in different experimental settings. We performed a comprehensive comparison analysis of 15 public algorithms for gRNA design, using 16 experimental gRNA datasets. Based on this analysis, we identified the top-performing algorithms, with which we further implemented various computational strategies to build ensemble models for performance improvement. Validation analysis indicates that the new ensemble model had improved performance over any individual algorithm alone at predicting gRNA efficacy under various experimental conditions. The new sgRNA design tool is freely accessible as a web application via https://crisprdb.org.”