Publications
† denotes co-first authors.
AI Research
LLM Fine-tuning
The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature Perspective
arXiv preprint (2026)
We analyze why overparameterized LLMs fine-tune so well, revealing that high dimensionality reduces variance and smooths the loss landscape, creating a blessing rather than a curse.
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
Under Review (2026)
We propose an RL-trained agent that efficiently navigates knowledge graphs for retrieval-augmented generation, achieving strong transfer across diverse KG structures.
Task Vectors
ICML 2025 Spotlight
We study task-vector formation in in-context learning through an encoder-decoder perspective, linking compact internal representations to in-context predictions.
When Do LLMs Improve Bayesian Optimization? A Systematic Comparison Across Molecular and Protein Design
NeurIPS 2025 AI4Science Workshop
We systematically benchmark when LLM-guided Bayesian optimization outperforms classical methods across molecular and protein design tasks.
Protein Embeddings
Probing the Embedding Space of Protein Foundation Models through Intrinsic Dimension Analysis
NeurIPS 2025 AIDrugX Workshop
We probe protein foundation model embeddings via intrinsic dimension analysis, revealing how representation geometry relates to downstream task performance.
ICML 2026
We introduce a framework to estimate the empowerment of LLM agents—their capacity to influence the environment—as a step toward safer AI deployment.
When AI Co-Scientists Fail: SPOT — Benchmark for Automated Verification of Scientific Research
arXiv preprint (2025)
We present SPOT, a benchmark for evaluating whether AI co-scientist systems can reliably verify claims in scientific papers across multiple domains.
MethylGPT: a foundation model for the DNA methylome
Under Review at Nature Methods (2024)
We develop a GPT-style foundation model pretrained on large-scale human DNA methylation data, enabling zero-shot biological age prediction and disease classification.
TMLR 2024
We resolve the apparent contradiction between Kaplan and Chinchilla neural scaling laws by identifying key methodological differences and proposing a unified framework.
A Resource Model for Neural Scaling Law
ICLR 2024 BGPT Workshop
We draw an analogy between neural scaling and ecological resource competition, offering a mechanistic model that predicts power-law scaling behavior in neural networks.
Biophysics Research
Interspecies Interactions Drive Community-level Selection in Microbial Coalescence
Nature Ecology & Evolution (2026)
We show that interspecies interactions, rather than individual species fitness, are the primary driver of community-level outcomes when microbial communities merge.
Transition from Global Stability to Multiple Attractors in Microcosms
Under Review at Nature Portfolio (2025)
We experimentally demonstrate how microbial ecosystems transition from a single global equilibrium to multiple stable states as community complexity increases.
Noncovalent Antibody Catenation on a Target Surface Greatly Increases the Antigen-Binding Avidity
eLife 2023
We discover that antibodies can form noncovalent chain-like structures on antigen surfaces, dramatically enhancing binding avidity through cooperative multivalent interactions.
Rapid species identification of pathogenic bacteria from a minute quantity exploiting three-dimensional quantitative phase imaging and artificial neural network
Light: Science & Applications 2022
We combine 3D quantitative phase imaging with deep learning to achieve rapid, label-free identification of individual bacterial pathogens at the single-cell level.
Deep-Learning Based Three-Dimensional Label-Free Tracking and Analysis of Immunological Synapses of CAR-T Cells
eLife 2020
We develop a deep-learning framework for 3D label-free tracking and quantitative analysis of immunological synapses formed by CAR-T cells in real time.