Syed Mostofa Monsur


I am a PhD Student in the Computer Science Department at the State University of New York at Stony Brook.

Previously, I led the AI/ML Team at Celloscope. I received my Bachelor of Science in Computer Science and Engineering from CSE BUET.

Email  /  CV  /  gScholar  /  LinkedIn   

profile photo

Stony Brook University Logo

Research Interests

I'm interested in NLP, LLM Reasoning, AI for Science, etc.

Publications

[5] Scaling down, Powering up: A Survey on the Advancements of Small Vision-Language Models
Sheikh Iftekhar Ahmed, Muhammad Zubair Hasan, Abrar Jahin Niloy, Syed Mostofa Monsur, Mark V. Albert
Information Fusion, 2025
[paper]
[4] SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction
Syed Mostofa Monsur, Shariar Kabir, Sakib Chowdhury
BLP Workshop at EMNLP, 2023
[paper]
[3] Grid-Coding: An Accessible, Efficient, and Structured Coding Paradigm for Blind and Low-Vision Programmers
Md Ehtesham-Ul-Haque, Syed Mostofa Monsur, Syed Masum Billah
UIST, 2022 (Best Paper Award)
[paper] / [video] / [featured]
[2] SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus
Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi, Shafayat Ahmed
LREC, 2022
[poster] / [paper]
[1] Distributing Active Learning Algorithms
Syed Mostofa Monsur, Muhammad Abdullah Adnan
NSysS, 2020
[video] / [slides] / [paper]

Industry Projects

Agrani Voice Banking
Led AI/ML Team at Celloscope

Agrani Bank is Bangladesh's one of the largest state-owned banks with a huge number of customers who have very little access to information. Agrani Voice Banking makes banking services accessible to everyone. It is powered by Bengali ASR and a finetuned NLU engine for natural language-driven fund transfers and inquiries.

Industry-Grade ASR, TTS and Speaker Verification for Bengali Speech-Driven Systems
Led AI/ML Team at Celloscope

Collected and pre-processed 400+ hrs of Bengali audio and transcription. Trained end-to-end high-quality ASR models. Trained industry-grade TTS for Bengali language with 40+ hours of curated data and improved generated audio quality with Vocoders (naturalizing audio) Integrated with Natural Language driven User Interfaces including speech-driven chatbots. Developed industry-grade speaker verification system using ensemble of pre-trained unispeech-sat, wavlm and ecapa-tdnn.


Template stolen from Jon Barron's Site.