Syed Mostofa Monsur

I am a Lead AI Research Engineer at Celloscope in Dhaka, Bangladesh, where I lead the AI Team (Speech, Vision, NLP). I have 5+ years of experience working in the industry where I developed multiple user-facing systems powered by NLP and Speech-based UX.

At Celloscope I have been developing Real-World Applications with Natural Language Interfaces thus making services accessible to hundreds of thousands of users who have limited access to information. I received my Bachelor of Science in Computer Science and Engineering with a Major in Artificial Intelligence from CSE BUET. There I worked with Professor Muhammad Abdullah Adnan on Active Learning for Distributed Systems.

Email  /  CV  /  Google Scholar  /  LinkedIn  /  Github

profile photo

Research Interests

I'm interested in NLP, Conversational AI, Interactive Systems, Human-Centered Design etc.

Publications

SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction
Syed Mostofa Monsur, Shariar Kabir, Sakib Chowdhury
BLP Workshop at EMNLP, 2023
paper

In this paper, we have introduced SynthNID, a system to gen- erate domain-specific document image data for training OCR-less end-to-end Key Infor- mation Extraction systems. We show the generated data improves the performance of the extraction model on real datasets and the system is easily extendable to generate other types of scanned documents for a wide range of document understanding tasks.

Grid-Coding: An Accessible, Efficient, and Structured Coding Paradigm for Blind and Low-Vision Programmers
Md Ehtesham-Ul-Haque, Syed Mostofa Monsur, Syed Masum Billah
UIST, 2022 (Best Paper Award)
project page / video / paper

Grid-Coding is a new paradigm for non-visual programming on grid structure instead of text editors that enables blind and low-vision programmers to interact with source code more effectively. Through participatory sessions and evaluation with 12 BLV programmers, it proved to enhance code navigation, context understanding, syntax error detection, unique usage patterns and overall programming productivity compared to regular text editors.

SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus
Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi, Shafayat Ahmed
LREC, 2022
poster / paper

SHONGLAP: Bengali's first large annotated open-domain dialogue corpus, a breakthrough for low-resource languages. Leveraging weak-supervision techniques, this 7.7k+ fully annotated dialog dataset enhances language models like BanglaBERT for improved downstream classification tasks.

Distributing Active Learning Algorithms
Syed Mostofa Monsur, Muhammad Abdullah Adnan
NSysS, 2020
video / slides / paper

This work presents distributed implementations of advanced active learning algorithms, addressing the challenge of optimizing labeling sequences for large unlabeled datasets, and demonstrating improved performance compared to random labeling scenarios on real datasets using multinode spark clusters and distributed file systems.

Industry Projects

Agrani Voice Banking
Leaded Speech and NLU Team at Celloscope

Led development of Agrani Voice Banking, Bangladesh's pioneering Voice-based AI Chatbot for seamless banking activities, serving hundreds of thousands of real-users. Agrani Bank is Bangladesh's one of the largest state-owned banks with a huge number of customers who have very little access to information. Agrani Voice Banking makes banking services accessible to everyone. It is powered by Bengali ASR and a finetuned NLU engine for natural language-driven fund transfers and inquiries.

National ID Information Extraction using Document Transformers
Leaded NLP Team at Celloscope
slides

After fine-tuning pretrained document transformers, it achieves significantly good performance on extracting NID information. We treated the NID extraction problem as a document question-answering problem – querying on key fields of the NID image document. The model is fine-tuned with real user data and synthetic data as well.

License-Plate Extraction from Very Noisy Real-World Deployment
Leaded NLP Team at Celloscope and ML Team at Spectrum
slides

License-Plate extraction task in very noisy real-world setting. Fine-tuning end-to-end sequence extraction models on real and synthetic data for better performance. System deployed in several toll booths in Bangladesh for reporting analytics.

Industry-Grade ASR, TTS and Speaker Verification for Bengali Speech-Driven Systems
Leaded NLP Team at Celloscope

Collected and pre-processed 400+ hrs of Bengali audio and transcription. Trained end-to-end high-quality ASR models. Trained industry-grade TTS for Bengali language with 40+ hours of curated data and improved generated audio quality with Vocoders (naturalizing audio) Integrated with Natural Language driven User Interfaces including speech-driven chatbots. Developed industry-grade speaker verification system using ensemble of pre-trained unispeech-sat, wavlm and ecapa-tdnn.


Template stolen from Jon Barron's Site.