Sajib Dasgupta
Department of Computer Science
University of Texas at Dallas
Advisor: Vincent Ng
Contact Information
- Postal Address
2238 Flat Creek Drive
Richardson, Tx 75080, USA
- Email: s d g n e w @ y a h o o . c o m (Discard space)
News
- Paper accepted in ICML 2010! I am elated.
- I have a paper accepted in a NIPS workshop on Clustering.
I learned spectral learning watching Dr. Ulrike von Luxburg's talk online, who happened to send me the acceptance email!
- Finally, I am releasing the part-of-speech lexicon that we induced from raw corpus without any labeled data! (Link: POS)
I am also releasing our unsupervised morphological segmentation output (Link: Morphology).
Both of these can be used directly in other unsupervised NLP systems like parsing.
- I am selected for the Louis Beecherl Graduate Fellowship for 2009-2010.
- I am back to Dallas after a month-long vacation to Bangladesh. Presented two papers in Singapore in the mean time.
- I have papers accepted in ACL 09 and EMNLP 09.
- I am back to school! After spending 1 year at IBM Almaden Research Center, California, I am back to UTD to finish my PhD.
- I reviewed for ACL 09, EMNLP 08.
- I have a patent submitted, thanks to IBM.
- My thesis is up for download finally. Link: Thesis
CV
Research
- Areas of Interests
Natural Language Processing and Machine Learning. My special interests in natural language processing are unsupervised learning, morphology, sentiment and text classification.
- Research Experience
Human Language Technology Research Institute (2005-Current):
Research on text clustering with an aim to producing multiple clusterings of the data simultaneously according to user interests. Researched on automatic review classification, unsupervised word segmentation without using any language specific grammatical knowledge for four different languages, with an application to language-independent part-of-speech induction.
IBM Research (2007 to 2008):
Worked in IBM Almaden Research Center, California, where the goal was to learn cross-corpus associations from unstructured data sources in an unsupervised manner with an aim to bridging the gap in between two disparate subject areas.
Center for Research on Bangla Language Processing (2004 to 2005):
Worked in the Center for Research on Bangla Language Processing (CRBLP), BRAC University, Bangladesh as a research programmer from February 2004 to July 2005. Researched on knowledge driven two-level
morphological parsing for Bangla.
- Peer Reviewed Publications
Towards Subjectifying Text Clustering.
Sajib Dasgupta and Vincent Ng.
Accepted for presentation in SIGIR, 2010 (Acceptance rate: crazy 16.5%).
Topic-wise, Sentiment-wise, or Otherwise? Sentiment Clustering Using Human Feedback.
Sajib Dasgupta and Vincent Ng.
Accepted for publication in the Journal of Artificial Intelligence Research (JAIR), 2010.
Mining Clustering Dimensions.
Sajib Dasgupta and Vincent Ng.
Accepted for presentation in the International Conference on Machine Learning (ICML), 2010.
Single Data, Multiple Clusterings. [My talk on Videolectures.net]
Sajib Dasgupta and Vincent Ng.
Accepted for presentation in the NIPS workshop on "Clustering", 2009.
Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification.
Sajib Dasgupta and Vincent Ng.
In the conference of the EMNLP, Singapore, 2009.
Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification.
Sajib Dasgupta and Vincent Ng.
In the conference of the ACL, Singapore, 2009.
Discriminative Models for Semi-Supervised Natural Language Learning.
Sajib Dasgupta and Vincent Ng.
Position paper in the NAACL-HLT 2009 workshop on Semisupervised Learning for Natural Language Processing, Boulder, 2009.
Unsupervised Part-of-Speech Acquisition for Resource-Scarce Languages.
Sajib Dasgupta and Vincent Ng.
In the conference on Empirical Methods in Natural Language Processing (EMNLP), Prague, 2007.
High-Performance, Language-Independent Morphological Segmentation.
Sajib Dasgupta and Vincent Ng.
In the conference of the NAACL-HLT, New York, 2007.
Unsupervised Morphological Parsing of Bengali.
Sajib Dasgupta and Vincent Ng.
In the journal of Language Resources and Evaluation (LRE), 2007, published by Springer.
Unsupervised Word Segmentation for Bangla.
Sajib Dasgupta and Vincent Ng.
In the conference of the ICON, India, 2007.
Examining the Role of Linguistics Knowledge Sources in the Automatic Identification and Classification of Reviews.
Vincent Ng, Sajib Dasgupta and S. M. Niaz Arifin.
In the conference of the ACL, Sydney, 2006.
- Thesis
Toward Language Independent Morphological Segmentation and Part-of-speech Induction
Advisor: Vincent Ng, University of Texas at Dallas.
- Patent Submitted (IBM)
Information Extraction from Multiple Expertise-Specific Subject Areas. Docket no. ARC920080067US1. With co-inventors Dipayan Gangopadhyay and Norm Pass.
Datasets and Others:
Our Unsupervised Morphological Segmentation Output: English, Bengali
Our Unsupervised Part-of-Speech Lexicon Induction Output: English, Bengali
Goldstandard Used for Unsupervised Morphological Segmentation: Bengali, Finnish and Turkish
Goldstandard Created for Unsupervised Part-of-Speech Lexicon Induction: Bengali
Some old papers: Here