Research Focus
He started at I²R, A*STAR focusing on dialogue summarization research. When the MERaLiON team was later formed to spearhead Singapore's national LLM project, he joined it and led evaluation and data preparation for the AudioLLM workstream, focusing on making large models work across the diverse languages of Southeast Asia. He concluded this work in 2025.
He served as Tech Lead (Evaluation & Data) of the MERaLiON Team under the National Multimodal LLM Programme (NMLP), a S$70 million grant from NRF.
Research Topics
- Dialogue Summarization
- How to effectively summarize multi-turn dialogues while preserving key information?
- What techniques can improve coherence and factual consistency in dialogue summaries?
- Making LLMs hear — AudioLLM
- What techniques can be used to effectively integrate audio processing capabilities into existing LLM architectures?
- What is the most efficient approach for achieving seamless cross-modality integration?
- What benchmarks can be designed to accurately evaluate the real-world performance of AudioLLMs?
Publications
- MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models — ACL 2025
- AudioBench: A Universal Benchmark for Audio Large Language Models — NAACL 2025
- Instructive Dialogue Summarization with Query Aggregations — EMNLP 2023
- CRAFT: Extracting and Tuning Cultural Instructions from the Wild — C3NLP 2024
- In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models — EMNLP Findings 2024
- Resilience of Large Language Models for Noisy Instructions — EMNLP Findings 2024
- CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment — SUMEval 2025
- SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning — NAACL 2024
- SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages — EMNLP 2024
- MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders — ICASSP 2025
- CoinMath: Harnessing the Power of Coding Instruction for Math LLM — ACL Findings 2025
- Optimizing Cross-Modality Alignment Module for Audio Large Language Models — Data Intelligence 2025
- MNSC: Advancing Singlish Speech Understanding with Carefully Curated Corpora — ASRU 2025
- Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia — ACL 2025
- NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 — MLC-SLM 2025
- Diversity and Complementarity of Speech Encoders across Diverse Tasks in a Multi-modal Large Language Model — ASRU 2025
- Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems — arXiv 2025
- IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models — AACL 2025
- Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs — AACL 2025
- Train Multi-Modal LLMs to Understand Diverse Speech Paralinguistics by Distilling from Teachers with Meta-Information — AAAI 2026 Workshop on Audio-Centric AI
Students Supervised
- Pham The Binh Minh — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Multimodal AudioLLMs.
- Yiming Gao — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Instruction following capability for multimodal large language models. (AACL 2025)
- Tey Xue Cong — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
- Jayden Lum — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
- Yanchao Li — ACIS PhD Scholar, NTU, Singapore (2024-01 – 2025-04). Supervisor: Nancy F. Chen. Long video understanding.
- Ziyi Xu — Research Intern, NUS, Singapore (2024-07 – 2024-12). Supervisor: Sun Shuo. Multimodal alignment data collection and filtering.
- Ayrton San Joaquin — Research Associate, DesCarte@CREATE, Singapore (2023-09 – 2024-08). Efficient training of large language models through gradient estimation. (EMNLP 2024 Findings)
- Anh Thuc Nguyen — Research Intern, UNC Chapel Hill, USA (2024-01 – 2024-05). Question generation for MERaLiON project and evaluation dataset creation.
Academic Services
- Publication Chair: EMNLP 2023
- Local Organizing Team: EMNLP 2023
- Area Chair: ACL ARR (2024–2025)
- Editorial Board Member: APSIPA Transactions on Signal and Information Processing (2023–2025)
- Reviewer: ACL, EMNLP, NAACL, ICASSP, IEEE TASLP
Awards
- Best Paper Award — SUMEval Workshop, COLING 2025
- Best Paper Award — C3NLP Workshop, ACL 2024
Videos
- MERaLiON Introduction — Introduction to MERaLiON project. youtube.com/embed/nBA3MqwjN3I
- MERaLiON Demo — Demo of MERaLiON AudioLLM capabilities. youtube.com/embed/HZSa7vT73Lg
Talks
- 2025.03 — Lorong AI, Singapore. Evaluation on Audio-LLMs and Beyond. Slides