Dipanjan Das

Welcome to my homepage. I am a Senior Director of Research at Google Deepmind. I am based in New York City. I lead teams of researchers distributed between New York, London, Mountain View, Zurich and San Francisco. We work on language technologies.

My current research ensures that large language models can generate factually accurate content that is attributable to trustworthy sources. I contribute to the Gemini project and collaborate with many researchers and engineers across Google Deepmind and rest of Google to ensure Gemini post trained models have the highest possible factual accuracy in communicative scenarios. My broad interests lie in controllable models of language generation in communicative and collaborative scenarios.

I worked on the following problems in the past:

Model interpretability
Representation learning
Paraphrase identification and natural language inference
Semantic parsing
Cross-lingual methods

Prior to working at Google Deepmind, I was part of Google Brain and Google Research. I completed a Ph.D. from the Language Technologies Institute, School of Computer Science at Carnegie Mellon University in 2012. In 2005, I completed a B.Tech. in Computer Science and Engineering from IIT Kharagpur.

Here is a stale CV.

I can be reached at dipanjand@google.com.

Postdocs, Interns and Student Collaborators

I have (co-) hosted and collaborated with extraordinary postdoctoral scholars and student interns at Google.

Naomi Saphra

Diyi Yang

Dinghan Shen

Hao Peng

Luheng He

Karthik Narasimhan

Nicholas FitzGerald

Siva Reddy

Yoav Artzi

Karl Moritz Hermann

Jason Mann

Lu Yang

David Zhang

Papers

(Google Scholar) (Semantic Scholar)

Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features.
Hannah Rashkin, David Reitter, Gaurav Singh Tomar, Dipanjan Das.
Proceedings of ACL.
2021.
The MultiBERTs: BERT Reproductions for Robustness Analysis.
Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick.
arXiv preprint.
2021.
The GEM benchmark: Natural language generation, its evaluation and metrics.
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Andre Niyongabo Rubungo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou.
Proceedings of GEM.
2021.
Decontextualization: Making Sentences Stand-Alone.
Eunsol Choi, Jennimaria Palomaki, Matthew Lamm, Tom Kwiatkowski, Dipanjan Das, Michael Collins.
Transactions of the Association for Computational Linguistics.
2021.
Learning to evaluate translation beyond english: BLEURT submissions to the wmt metrics 2020 shared task.
Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh.
Proceedings of WMT.
2020.
ToTTo: A Controlled Table-To-Text Generation Dataset.
Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das.
Proceedings of EMNLP.
2020.
BLEURT: Learning Robust Metrics for Text Generation.
Thibault Sellam, Dipanjan Das, Ankur Parikh.
Proceedings of ACL.
2020.
Syntactic Data Augmentation Increases Robustness to Inference Heuristics.
Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen.
Proceedings of ACL.
2020.
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation.
Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William Cohen.
Proceedings of ACL.
2019.
BERT Rediscovers the Classical NLP Pipeline.
Ian Tenney, Dipanjan Das and Ellie Pavlick.
Proceedings of ACL.
2019.
Text Generation with Exemplar-based Adaptive Decoding.
Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das.
Proceedings of NAACL.
2019.
What do you learn from context? Probing for sentence structure in contextualized word representations.
Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, Ellie Pavlick.
Proceedings of ICLR.
2019.
Identifying Well-formed Natural Language Questions.
Manaal Faruqui, Dipanjan Das.
Proceedings of EMNLP.
2018.
WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse.
Manaal Faruqui, Ellie Pavlick, Ian Tenney, Dipanjan Das.
Proceedings of EMNLP.
2018.
Learning To Split and Rephrase From Wikipedia Edit History.
Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das.
Proceedings of EMNLP.
2018.
Neural paraphrase identification of questions with noisy pretraining.
Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das.
Proceedings of the First Workshop on Subword and Character Level Models in NLP.
2017.

Learning recurrent span representations for extractive question answering.
Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, Jonathan Berant.
arXiv preprint arXiv:1611.01436.
2017
A decomposable attention model for natural language inference.
Ankur P. Parikh, Oscar Täckström, Dipanjan Das, Jakob Uszkoreit.
Proceedings of EMNLP.
2016.
Transforming dependency structures to logical forms for semantic parsing.
Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, Mirella Lapata.
Transactions of the Association for Computational Linguistics.
2016.
Semantic Role Labeling with Neural Network Factors.
Nicholas FitzGerald, Oscar Täckström, Kuzman Ganchev, Dipanjan Das.
Proceedings of EMNLP.
2015.
Efficient inference and structured learning for semantic role labeling.
Oscar Täckström, Kuzman Ganchev, Dipanjan Das.
Transactions of the Association for Computational Linguistics.
2015.
Frame-semantic parsing.
Dipanjan Das, Desai Chen, André F.T. Martins, Nathan Schneider, Noah A. Smith.
Computational Linguistics.
2014.
Learning Compact Lexicons for CCG Semantic Parsing.
Yoav Artzi, Dipanjan Das, Slav Petrov.
Proceedings of EMNLP.
2014.
Enhanced search with wildcards and morphological inflections in the Google Books Ngram Viewer.
Jason Mann, David Zhang, Lu Yang, Dipanjan Das, Slav Petrov.
Proceedings of ACL (demo track).
2014.
Statistical models for frame-semantic parsing.
Dipanjan Das
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014).
2014.
Semantic frame identification with distributed word representations.
Karl Moritz Hermann, Dipanjan Das, Jason Weston, Kuzman Ganchev.
Proceedings of ACL.
2014.
Token and type constraints for cross-lingual part-of-speech tagging.
Oscar Täckström, Dipanjan Das, Slav Petrov, Ryan McDonald, Joakim Nivre.
Transactions of the Association for Computational Linguistics.
2013.
Cross-lingual discriminative learning of sequence models with posterior regularization.
Kuzman Ganchev, Dipanjan Das.
Proceedings of EMNLP.
2013.
(Best paper award honorable mention)
Universal dependency annotation for multilingual parsing.
Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, Jungmee Lee.
Proceedings of ACL.
2013.
An exact dual decomposition algorithm for shallow semantic parsing with constraints.
Dipanjan Das, André F.T. Martins, Noah A. Smith.
Proceedings of *SEM.
2012.
Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties.
Dipanjan Das and Noah A. Smith.
Proceedings of NAACL.
2012.
Semi-Supervised and Latent-Variable Models of Natural Language Semantics.
Dipanjan Das.
Ph.D. Thesis. Carnegie Mellon University.
2012.
Semi-supervised frame-semantic parsing for unknown predicates.
Dipanjan Das and Noah A. Smith.
Proceedings of ACL.
2011.
Unsupervised part-of-speech tagging with bilingual graph-based projections.
Dipanjan Das and Slav Petrov.
Proceedings of ACL.
2011.
(Best Paper Award)
Part-of-speech tagging for twitter: Annotation, features, and experiments.
Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, Noah A. Smith.
Proceedings of ACL.
2011.
A universal part-of-speech tagset.
Slav Petrov, Dipanjan Das, Ryan McDonald.
Proceedings of LREC.
2011.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance.
Shay B. Cohen, Dipanjan Das, Noah. A. Smith.

Proceedings of EMNLP.
2011.

Semafor: Frame argument resolution with log-linear models.
Desai Chen, Nathan Schneider, Dipanjan Das, Noah A. Smith.
Proceedings of SemEval.
2010.
Distributed asynchronous online learning for natural language processing.
Kevin Gimpel, Dipanjan Das, Noah A. Smith.
Proceedings of CoNLL.
2010.
Movie reviews and revenues: An experiment in text regression.
Mahesh Joshi, Dipanjan Das, Kevin Gimpel, Noah A. Smith.
Proceedings of NAACL.
2010.
Probabilistic frame-semantic parsing.
Dipanjan Das, Nathan Schneider, Desai Chen, Noah A. Smith.
Proceedings of NAACL.
2010.
SEMAFOR 1.0: A probabilistic frame-semantic parser.
Dipanjan Das, Nathan Schneider, Desai Chen, Noah A. Smith.
LTI CMU Technical Report.
2010.
Visualizing topical quotations over time to understand news discourse.
Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan Black, Frederick L. Crabbe, Noah A. Smith.
LTI CMU Technical Report.
2010.
Paraphrase identification as probabilistic quasi-synchronous recognition.
Dipanjan Das, Noah A. Smith.
Proceedings of ACL.
2009.
Stacking dependency parsers.
André FT Martins, Dipanjan Das, Noah A. Smith, Eric P. Xing.
Proceedings of EMNLP.
2008.
Improving multimedia retrieval with a video OCR.
Dipanjan Das, Datong Chen, Alexander G. Hauptmann.
Multimedia Content Access: Algorithms and Systems II.
2008.
Automatic extraction of briefing templates.
Dipanjan Das, Mohit Kumar, Alexander I. Rudnicky.
Proceedings of IJCNLP.
2008.
Summarizing non-textual events with a 'briefing' focus.
Mohit Kumar, Dipanjan Das, Alexander I. Rudnicky.
Large Scale Semantic Access to Content (Text, Image, Video, and Sound).
2007.
Combating information overload in non-visual web access using context.
Jalal Mahmud, Yevgen Borodin, Dipanjan Das, I. V. Ramakrishnan.
Proceedings of IUI.
2007.
A Survey on Automatic Text Summarization.
Dipanjan Das, André FT Martins.
Literature Survey for the Language and Statistics II course at CMU
2007.
Multi-lingual broadcast news retrieval.
A.G. Hauptmann, M.-Y. Chen, M. Christel, D. Das, W.-H. Lin, R. Yan, J. Yang, G. Backfried, X. Wu.
NIST TRECVID Workshop.
2006.
Improving non-visual web access using context.
Jalal Mahmud, Yevgen Borodin, Dipanjan Das, IV Ramakrishnan.
Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility.
2006.
An affinity based Greedy approach towards Chunking for Indian languages.
Dipanjan Das, Monojit Choudhury, Sudeshna Sarkar, Anupam Basu.
Proceedings of ICON.
2005.
Finite state models for generation of Hindustani classical music.
Dipanjan Das, Monojit Choudhury.
Proceedings of International Symposium on Frontiers of Research in Speech and Music.
2005.
Chunker and shallow parser for free word order languages: an approach based on valency theory and feature structures.
Dipanjan Das, Monojit Choudhury.
Proceedings of ICON.
2004.

Recent News

Postdocs, Interns and Student Collaborators

Papers