Tri Dao

Tri Dao
Tri Dao
Born
💼 Occupation	Assistant Professor of Computer Science

Tri Dao is a computer scientist and academic. He is an assistant professor in the Department of Computer Science at Princeton University, creator of deep learning architecture Mamba,^[1]^[2]and co-founder and chief scientist of Together AI. His research focuses on machine learning systems, particularly hardware-aware algorithms for efficient training and inference of large-scale models.^[3]^[4]

Education

Dao received a Bachelor of Science in Mathematics from Stanford University in 2016. He completed a Master of Science in Computer Science in 2016 and a Master of Science in Statistics in 2019, also at Stanford University. He earned a PhD in Computer Science from Stanford University in 2023. His doctoral dissertation, Hardware-aware Algorithms for Efficient Machine Learning^[5],was supervised by Christopher Ré and Stefano Ermon.^[6]^[7]

Career

Dao joined the faculty of Princeton University in 2024 as an assistant professor of computer science. His work lies at the intersection of machine learning and systems, with an emphasis on computational efficiency, hardware-aware algorithm design, and sequence models capable of handling long-range dependencies.^[8]^[9]

Dao is a co-founder and chief scientist of Together AI, a company that develops infrastructure and models for large-scale machine learning. In addition to his academic appointment, he has held research positions at Adept AI, Microsoft Research, Citadel Securities, and Google.^[10]^[11]

He has served as an organizer of workshops on efficient systems for foundation models at the International Conference on Machine Learning and as an area chair for conferences including COLM, ICLR, and ICML. He has also acted as a reviewer for major machine learning conferences and journals, including NeurIPS, ICML, ICLR, and AISTATS.^[12]^[13]

Research

Dao’s research addresses the computational and memory challenges of large-scale machine learning models. He is a primary creator of FlashAttention, a high-performance attention mechanism that reduces memory usage and increases computational efficiency in transformer models. FlashAttention has been widely adopted in both academia and industry, integrated into frameworks such as PyTorch, Jax, Huggingface Transformers, and Microsoft DeepSpeed, and used to accelerate training and inference of large language models by organizations including Meta, Microsoft, OpenAI, and Google.^[14]

Dao also created Mamba, a sequence modeling architecture based on state space models designed to improve the efficiency and scalability of long-context language models. Mamba enables faster training and inference for models with hundreds of billions of parameters and has been integrated into major ML frameworks, supporting deployment on multiple GPU platforms and cloud infrastructures.^[15]

His work has appeared in leading peer-reviewed conferences and journals, including NeurIPS, ICML, ICLR, COLM, MLSys, and ICCV. Dao’s contributions are notable for combining algorithmic design with practical systems implementation, resulting in methods that are both theoretically grounded and widely used in production-scale machine learning.^[16]^[17]

Honors and awards

Dao has received several research honors, including the AI2050 Early Career Fellowship from Schmidt Sciences in 2025, which recognizes early-career researchers making high-impact contributions in artificial intelligence. He has also been named a Google Research Scholar and received the Google ML and Systems Junior Faculty Award.^[18]

Reference

↑ Patro, Badri Narayana; Agneeswaran, Vijay Srinivas (2025-11-15). "Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, Applications, and Challenges". Engineering Applications of Artificial Intelligence. 159: 111279. doi:10.1016/j.engappai.2025.111279. ISSN 0952-1976.
↑ Tiezzi, Matteo; Casoni, Michele; Betti, Alessandro; Gori, Marco; Melacci, Stefano (2026-01-01). "State-space modeling in long sequence processing: a survey on recurrence in the transformer era". Neural Networks. 193: 108039. doi:10.1016/j.neunet.2025.108039. ISSN 0893-6080.
↑ "Tri Dao". scholar.google.com. Retrieved 2026-01-05.
↑ "Mamba, A New Approach That May Outperform Transformers". Mamba, A New Approach That May Outperform Transformers. 2024-04-10. Retrieved 2026-01-06.
↑ Quang, Tri Dao Phuc (2023). Hardware-aware Algorithms for Efficient Machine Learning. Stanford University. Search this book on
↑ "Tri Dao". Faculty Princeton University. Retrieved 2026-01-05.
↑ Franzen, Carl (2026). "TII's Falcon H1R 7B can out-reason models up to 7x its size — and it's (mostly) open".
↑ Princeton, The Office of Communications. "Board approves six faculty appointments". Princeton University. Retrieved 2026-01-05.
↑ Ruan, Jiacheng; Li, Jincheng; Xiang, Suncheng (2025-09-16). "VM-UNet: Vision Mamba UNet for Medical Image Segmentation". ACM Trans. Multimedia Comput. Commun. Appl. doi:10.1145/3767748. ISSN 1551-6857.
↑ "Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference". www.together.ai. Retrieved 2026-01-05.
↑ Lambert, Nathan (2025-12-18). "Interviewing Tri Dao and Michael Poli on the future of LLM architectures". www.interconnects.ai. Retrieved 2026-01-07.
↑ Gu, Albert; Dao, Tri; Ermon, Stefano; Rudra, Atri; Ré, Christopher (2020). "HiPPO: Recurrent Memory with Optimal Polynomial Projections". Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1474–1487.
↑ Poli, Michael; Massaroli, Stefano; Nguyen, Eric; Fu, Daniel Y.; Dao, Tri; Baccus, Stephen; Bengio, Yoshua; Ermon, Stefano; Re, Christopher (2023-07-03). "Hyena Hierarchy: Towards Larger Convolutional Language Models". Proceedings of the 40th International Conference on Machine Learning. PMLR: 28043–28078.
↑ Dao, Tri; Fu, Daniel Y.; Ermon, Stefano; Rudra, Atri; Ré, Christopher (2022), FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (PDF), Advances in Neural Information Processing Systems, retrieved 2026-01-05
↑ Fu, Daniel Y.; Dao, Tri; Saab, Khaled K.; Thomas, Armin W.; Rudra, Atri; Ré, Christopher (2023). "HUNGRY HUNGRY HIPPOS: TOWARDS LANGUAGE MODELING WITH STATE SPACE MODELS". International Conference on Learning Representations, ICLR.
↑ Dao, Tri; Gu, Albert (2024-07-08). "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality". Proceedings of the 41st International Conference on Machine Learning. PMLR: 10041–10071.
↑ Gu, Albert; Johnson, Isys; Goel, Karan; Saab, Khaled; Dao, Tri; Rudra, Atri; Ré, Christopher (2021). "Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers". Advances in Neural Information Processing Systems. Curran Associates, Inc. 34: 572–585.
↑ "Tri Dao". AI2050. Retrieved 2026-01-05.

This article "Tri Dao" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Tri Dao. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[1] Patro, Badri Narayana; Agneeswaran, Vijay Srinivas (2025-11-15). "Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, Applications, and Challenges". Engineering Applications of Artificial Intelligence. 159: 111279. doi:10.1016/j.engappai.2025.111279. ISSN 0952-1976.

[:0-2] Tiezzi, Matteo; Casoni, Michele; Betti, Alessandro; Gori, Marco; Melacci, Stefano (2026-01-01). "State-space modeling in long sequence processing: a survey on recurrence in the transformer era". Neural Networks. 193: 108039. doi:10.1016/j.neunet.2025.108039. ISSN 0893-6080.

[3] "Tri Dao". scholar.google.com. Retrieved 2026-01-05.

[4] "Mamba, A New Approach That May Outperform Transformers". Mamba, A New Approach That May Outperform Transformers. 2024-04-10. Retrieved 2026-01-06.

[5] Quang, Tri Dao Phuc (2023). Hardware-aware Algorithms for Efficient Machine Learning. Stanford University. Search this book on

[6] "Tri Dao". Faculty Princeton University. Retrieved 2026-01-05.

[7] Franzen, Carl (2026). "TII's Falcon H1R 7B can out-reason models up to 7x its size — and it's (mostly) open".

[8] Princeton, The Office of Communications. "Board approves six faculty appointments". Princeton University. Retrieved 2026-01-05.

[9] Ruan, Jiacheng; Li, Jincheng; Xiang, Suncheng (2025-09-16). "VM-UNet: Vision Mamba UNet for Medical Image Segmentation". ACM Trans. Multimedia Comput. Commun. Appl. doi:10.1145/3767748. ISSN 1551-6857.

[10] "Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference". www.together.ai. Retrieved 2026-01-05.

[11] Lambert, Nathan (2025-12-18). "Interviewing Tri Dao and Michael Poli on the future of LLM architectures". www.interconnects.ai. Retrieved 2026-01-07.

[12] Gu, Albert; Dao, Tri; Ermon, Stefano; Rudra, Atri; Ré, Christopher (2020). "HiPPO: Recurrent Memory with Optimal Polynomial Projections". Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1474–1487.

[13] Poli, Michael; Massaroli, Stefano; Nguyen, Eric; Fu, Daniel Y.; Dao, Tri; Baccus, Stephen; Bengio, Yoshua; Ermon, Stefano; Re, Christopher (2023-07-03). "Hyena Hierarchy: Towards Larger Convolutional Language Models". Proceedings of the 40th International Conference on Machine Learning. PMLR: 28043–28078.

[14] Dao, Tri; Fu, Daniel Y.; Ermon, Stefano; Rudra, Atri; Ré, Christopher (2022), FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (PDF), Advances in Neural Information Processing Systems, retrieved 2026-01-05

[15] Fu, Daniel Y.; Dao, Tri; Saab, Khaled K.; Thomas, Armin W.; Rudra, Atri; Ré, Christopher (2023). "HUNGRY HUNGRY HIPPOS: TOWARDS LANGUAGE MODELING WITH STATE SPACE MODELS". International Conference on Learning Representations, ICLR.

[16] Dao, Tri; Gu, Albert (2024-07-08). "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality". Proceedings of the 41st International Conference on Machine Learning. PMLR: 10041–10071.

[17] Gu, Albert; Johnson, Isys; Goel, Karan; Saab, Khaled; Dao, Tri; Rudra, Atri; Ré, Christopher (2021). "Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers". Advances in Neural Information Processing Systems. Curran Associates, Inc. 34: 572–585.

[18] "Tri Dao". AI2050. Retrieved 2026-01-05.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]