Profile banner
Pratik Dutta

Pratik Dutta

Computational genomics researcher focused on deep learning

Stony Brook, New York
Joined February 2026

Network

3.8K connections
🐻‍❄️
Stony Brook AI Researchers
👨‍🏫
IIT Patna Faculty
🏢
Industry Applied Scientists
🎓
IIT Patna AI Researchers
🩺
Stony Brook Biomedical
🧬
Strand Life Sciences
🔬
Broad Researchers

Summary

Applied deep learning for genomics and large-scale sequence models: Pratik Dutta co-authors foundational work on genome language models (DNABERT-2) and contributes to development and benchmarking of genome understanding datasets, indicating expertise in adapting modern NLP/transformer methods to genomic sequence data. arxiv+2
Multi-omics integration and translational bioinformatics: He develops deep learning frameworks (DeepMOIS-MC, DeePROG) and contributes to transcript-level tissue specificity (TransTEx), demonstrating a focus on integrative methods that link genomic/isoform-level data to clinically relevant stratification and prognosis. nih+2
Academic research trajectory and collaboration: Educated as a PhD researcher at IIT Patna and subsequently held postdoctoral and research scientist positions at Stony Brook University; publications show collaborations with domain experts and multiple institutions, reflecting strong academic productivity and cross-institutional teamwork. orcid+2
Open-source & reproducible research practices: Many publications list accompanying code or GitHub repositories (DNABERT-2, DeepMOIS-MC, DeePROG), indicating an emphasis on releasing implementations and resources for the community. github+2

Work

Education

Projects

Writing

TransTEx: novel tissue-specificity scoring method for grouping human transcriptome into different expression groups

August 1, 2024

Bioinformatics paper describing TransTEx, a transcript-level tissue-specificity scoring method and accompanying database for grouping transcripts into expression classes across human tissues.

Favicon imagepubmed.ncbi.nlm.nih.gov

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome

June 1, 2023

Preprint describing DNABERT-2, an efficient genome foundation model that uses BPE tokenization and presents the Genome Understanding Evaluation (GUE) benchmark across multi-species genome tasks.

Favicon imagearxiv.org

Deep multi-omics integration by learning correlation-maximizing representation identifies prognostically stratified cancer subtypes

June 1, 2023

Journal article presenting DeepMOIS-MC, an outcome-guided multi-omics integrative subtyping framework that maximizes correlation among omics views to identify prognostically relevant cancer subtypes; includes code availability.

Favicon imagepubmed.ncbi.nlm.nih.gov

Rapid, High-Throughput Single-Cell Multiplex In Situ Tagging (MIST) Analysis of Immunological Disease with Machine Learning

May 1, 2023

Analytical Chemistry paper describing scMIST single-cell multiplex in situ tagging technology and application of machine learning (random forest) for classification in a sepsis model.

Favicon imagepubmed.ncbi.nlm.nih.gov

DeePROG: Deep Attention-Based Model for Diseased Gene Prognosis by Fusing Multi-Omics Data

September 1, 2022

IEEE/ACM Transactions article describing DeePROG, a self-attention based deep multi-modal model for gene prognosis that fuses heterogeneous omics data; code available on GitHub.

Favicon imagepubmed.ncbi.nlm.nih.gov

A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering

January 1, 2020

Scientific Reports article presenting a protein-protein interaction-based generative model to improve gene clustering accuracy using weak supervision sources.

Favicon imagepubmed.ncbi.nlm.nih.gov

Hobbies

Active in AI and machine-learning community engagement and shares research updates on Twitter. twitter+1