Generative AI

Generative AI, and specifically Large Language Models (LLMs), hold immense potential for predictive analysis in both genomic medicine and single-cell analysis.

In Genomic Medicine:

• Identifying disease-causing variants: LLMs trained on vast datasets of genomes and associated phenotypes can scan individual genomes and predict the likelihood of harboring disease-causing variants. This can improve early diagnosis and personalized treatment decisions.

• Drug discovery and target identification: By analyzing large-scale genomic data and relevant scientific literature, LLMs can suggest novel drug targets or predict potential drug-gene interactions, accelerating the drug discovery process.

• Stratifying patients for clinical trials: LLMs can analyze patient genomic data and clinical information to identify subpopulations most likely to benefit from specific clinical trials, leading to more targeted and effective therapeutic development.

• Understanding complex biological processes: LLMs can process and analyze vast amounts of data on gene expression, protein interactions, and cellular pathways, aiding in the discovery of complex biological mechanisms underlying diseases.

In Single-Cell Analysis:

• Cell type identification and classification: LLMs trained on single-cell RNA-seq data can accurately classify different cell types within a complex tissue, revealing new cell populations and their roles in health and disease.

• Identifying cell-cell interactions: LLMs can analyze single-cell data to infer communication networks between different cell types, providing insights into tissue organization and function.

• Predicting cellular responses to stimuli: By learning from single-cell responses to various stimuli, LLMs can predict how individual cells or cell populations might react to drugs, environmental changes, or disease progression.

• Generating synthetic single-cell data: LLMs can be used to generate realistic simulations of single-cell data, facilitating the development and testing of new computational tools and analysis methods.

scGPT (single cell Generative Pre-trained Transformer):

Shown below example of scGPT over liver scRNA seq dataset.

scGPT, a Python package for single-cell multi-omic data analysis using pretrained foundation models. This model adapts the GPT approach to single-cell data, learning representations from the gene expression matrix and enabling tasks like cell type annotation, differential expression analysis, and even generating synthetic single-cell data.

scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as

-cell-type annotation,

-multi-batch integration,

-multi-omic integration,

-genetic perturbation prediction, and

-gene network inference

Example 1

Figure 1A Zero-shot single cell analysis with continual pre-trained scGPT. This scRNA-seq requires no further training of scGPT. The scRNA dataset taken from cellxgene Tabula Sapiens liver dataset.

Figure 1B Embeddings visualization

Steps: Downloaded Tabula sapiens liver dataset divided into

train(reference 80%) and test(query 20%),
preprocessed
Generated scGPT embeddings for each cell in reference and query datasets
Annotations transferred from reference to query dataset

Performance Evaluation

'accuracy': 0.9310861423220974,

'precision': 0.8325796185787914,

'recall': 0.7874799806240997,

'macro_f1': 0.8021677680969674

Page updated

Google Sites

Report abuse