Dipanjan Das


Welcome to my homepage.  I am a research scientist at Google AI, in the Language team.  I am based in New York City.  I lead teams of researchers and engineers distributed between New York, the San Francisco Bay Area and Stockholm.  We work on learning semantic representations of language and are focused on problems such as question answering, natural language inference and generation.  Most often, our methods involve deep learning.

Personally, I am interested in the following research problems:

Prior to joining Google, I completed a Ph.D. from the Language Technologies Institute, School of Computer Science at Carnegie Mellon University in 2012.  In 2005, I completed a B.Tech. in Computer Science and Engineering from IIT Kharagpur.

Here is a recent CV.  My official Google AI page is here.


I can be reached at dipanjand@google.com.


Recent News


Interns and Student Collaborators

I have (co-)hosted and collaborated with extraordinary student interns at Google.


Papers

(Google Scholar) (Semantic Scholar)

  1. Identifying Well-formed Natural Language Questions.
    Manaal Faruqui, Dipanjan Das.
    Proceedings of EMNLP.
    2018.

     
  2. AtomicWikiEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse.
    Ellie Pavlick, Manaal Faruqui, Ian Tenney, Dipanjan Das.
    Proceedings of EMNLP.
    2018.

     
  3. Learning To Split and Rephrase From Wikipedia Edit History.
    Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das.
    Proceedings of EMNLP.
    2018.

     
  4. Neural paraphrase identification of questions with noisy pretraining.
    Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das.
    Proceedings of the First Workshop on Subword and Character Level Models in NLP.
    2017.

  1. Learning recurrent span representations for extractive question answering.
    Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, Jonathan Berant.
    arXiv preprint arXiv:1611.01436.
    2017

     
  2. A decomposable attention model for natural language inference.
    Ankur P. Parikh, Oscar Täckström, Dipanjan Das, Jakob Uszkoreit.
    Proceedings of EMNLP.
    2016.

     
  3. Transforming dependency structures to logical forms for semantic parsing.
    Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, Mirella Lapata.
    Transactions of the Association for Computational Linguistics.
    2016.

     
  4. Semantic Role Labeling with Neural Network Factors.
    Nicholas FitzGerald, Oscar Täckström, Kuzman Ganchev, Dipanjan Das.
    Proceedings of EMNLP.
    2015.

     
  5. Efficient inference and structured learning for semantic role labeling.
    Oscar Täckström, Kuzman Ganchev, Dipanjan Das.
    Transactions of the Association for Computational Linguistics.
    2015.

     
  6. Frame-semantic parsing.
    Dipanjan Das, Desai Chen, André F.T. Martins, Nathan Schneider, Noah A. Smith.
    Computational Linguistics.
    2014.

     
  7. Learning Compact Lexicons for CCG Semantic Parsing.
    Yoav Artzi, Dipanjan Das, Slav Petrov.
    Proceedings of EMNLP.
    2014.

     
  8. Enhanced search with wildcards and morphological inflections in the Google Books Ngram Viewer.
    Jason Mann, David Zhang, Lu Yang, Dipanjan Das, Slav Petrov.
    Proceedings of ACL (demo track).
    2014.

     
  9. Statistical models for frame-semantic parsing.
    Dipanjan Das
    Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014).
    2014.

     
  10. Semantic frame identification with distributed word representations.
    Karl Moritz Hermann, Dipanjan Das, Jason Weston, Kuzman Ganchev.
    Proceedings of ACL.
    2014.

     
  11. Token and type constraints for cross-lingual part-of-speech tagging.
    Oscar Täckström, Dipanjan Das, Slav Petrov, Ryan McDonald, Joakim Nivre.
    Transactions of the Association for Computational Linguistics.
    2013.

     
  12. Cross-lingual discriminative learning of sequence models with posterior regularization.
    Kuzman Ganchev, Dipanjan Das.
    Proceedings of EMNLP.
    2013.

    (Best paper award honorable mention)
     
  13. Universal dependency annotation for multilingual parsing.
    Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, Jungmee Lee.
    Proceedings of ACL.
    2013.


     
  14. An exact dual decomposition algorithm for shallow semantic parsing with constraints.
    Dipanjan Das, André F.T. Martins, Noah A. Smith.
    Proceedings of *SEM.
    2012.

     
  15. Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties.
    Dipanjan Das and Noah A. Smith.
    Proceedings of NAACL.
    2012.

     
  16. Semi-Supervised and Latent-Variable Models of Natural Language Semantics.
    Dipanjan Das.
    Ph.D. Thesis.  Carnegie Mellon University.
    2012.

     
  17. Semi-supervised frame-semantic parsing for unknown predicates.
    Dipanjan Das and Noah A. Smith.
    Proceedings of ACL.
    2011.

     
  18. Unsupervised part-of-speech tagging with bilingual graph-based projections.
    Dipanjan Das and Slav Petrov.
    Proceedings of ACL.
    2011.

    (Best Paper Award)
     
  19. Part-of-speech tagging for twitter: Annotation, features, and experiments.
    Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, Noah A. Smith.
    Proceedings of ACL.
    2011.

     
  20. A universal part-of-speech tagset.
    Slav Petrov, Dipanjan Das, Ryan McDonald.
    Proceedings of LREC.
    2011.


     
  21. Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance.
    Shay B. Cohen, Dipanjan Das, Noah. A. Smith.

Proceedings of EMNLP.
2011.

 

  1. Semafor: Frame argument resolution with log-linear models.
    Desai Chen, Nathan Schneider, Dipanjan Das, Noah A. Smith.
    Proceedings of SemEval.
    2010.

     
  2. Distributed asynchronous online learning for natural language processing.
    Kevin Gimpel, Dipanjan Das, Noah A. Smith.
    Proceedings of CoNLL.
    2010.

     
  3. Movie reviews and revenues: An experiment in text regression.
    Mahesh Joshi, Dipanjan Das, Kevin Gimpel, Noah A. Smith.
    Proceedings of NAACL.
    2010.

     
  4. Probabilistic frame-semantic parsing.
    Dipanjan Das, Nathan Schneider, Desai Chen, Noah A. Smith.
    Proceedings of NAACL.
    2010.

     
  5. SEMAFOR 1.0: A probabilistic frame-semantic parser.
    Dipanjan Das, Nathan Schneider, Desai Chen, Noah A. Smith.
    LTI CMU Technical Report.
    2010.

     
  6. Visualizing topical quotations over time to understand news discourse.
    Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan Black, Frederick L. Crabbe, Noah A. Smith.
    LTI CMU Technical Report.
    2010.

     
  7. Paraphrase identification as probabilistic quasi-synchronous recognition.
    Dipanjan Das, Noah A. Smith.
    Proceedings of ACL.
    2009.

     
  8. Stacking dependency parsers.
    André FT Martins, Dipanjan Das, Noah A. Smith, Eric P. Xing.
    Proceedings of EMNLP.
    2008.

     
  9. Improving multimedia retrieval with a video OCR.
    Dipanjan Das, Datong Chen, Alexander G. Hauptmann.
    Multimedia Content Access: Algorithms and Systems II.
    2008.

     
  10. Automatic extraction of briefing templates.
    Dipanjan Das, Mohit Kumar, Alexander I. Rudnicky.
    Proceedings of IJCNLP.
    2008.

     
  11. Summarizing non-textual events with a 'briefing' focus.
    Mohit Kumar, Dipanjan Das, Alexander I. Rudnicky.
    Large Scale Semantic Access to Content (Text, Image, Video, and Sound).
    2007.

     
  12. Combating information overload in non-visual web access using context.
    Jalal Mahmud, Yevgen Borodin, Dipanjan Das, I. V. Ramakrishnan.
    Proceedings of IUI.
    2007.

     
  13. A Survey on Automatic Text Summarization.
    Dipanjan Das, André FT Martins.
    Language and Statistics II Class Report.
    2007.

     
  14. Multi-lingual broadcast news retrieval.
    A.G. Hauptmann, M.-Y. Chen, M. Christel, D. Das, W.-H. Lin, R. Yan, J. Yang, G. Backfried, X. Wu.
    NIST TRECVID Workshop.
    2006.

     
  15. Improving non-visual web access using context.
    Jalal Mahmud, Yevgen Borodin, Dipanjan Das, IV Ramakrishnan.
    Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility.
    2006.

     
  16. An affinity based Greedy approach towards Chunking for Indian languages.
    Dipanjan Das, Monojit Choudhury, Sudeshna Sarkar, Anupam Basu.
    Proceedings of ICON.
    2005.

     
  17. Finite state models for generation of Hindustani classical music.
    Dipanjan Das, Monojit Choudhury.
    Proceedings of International Symposium on Frontiers of Research in Speech and Music.
    2005.

     
  18. Chunker and shallow parser for free word order languages: an approach based on valency theory and feature structures.
    Dipanjan Das, Monojit Choudhury.
    Proceedings of ICON.
    2004.