Research Focus
I started my time at I²R, A*STAR focusing on dialogue summarization research. When the MERaLiON team was later formed to spearhead Singapore's national LLM project, I joined this elite group — honestly one of the sharpest teams at the institute, packed with top-tier PhDs and engineers. I took the lead on evaluation and data preparation for the AudioLLM workstream, focusing on making large models work across the diverse languages of Southeast Asia. It was a fast-paced journey that I wrapped up in 2025.
I served as Tech Lead of the MERaLiON Team under the National Multimodal LLM Programme (NMLP), a S$70 million grant from NRF.
Active Topics
- Dialogue Summarization
- How to effectively summarize multi-turn dialogues while preserving key information?
- What techniques can improve coherence and factual consistency in dialogue summaries?
- Making LLMs hear — AudioLLM
- What techniques can be used to effectively integrate audio processing capabilities into existing LLM architectures?
- What is the most efficient approach for achieving seamless cross-modality integration?
- What benchmarks can be designed to accurately evaluate the real-world performance of AudioLLMs?
Publications
- MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models — ACL 2025
- AudioBench: A Universal Benchmark for Audio Large Language Models — NAACL 2025
- Instructive Dialogue Summarization with Query Aggregations — EMNLP 2023
- CRAFT: Extracting and Tuning Cultural Instructions from the Wild — C3NLP 2024
- In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models — EMNLP Findings 2024
- Resilience of Large Language Models for Noisy Instructions — EMNLP Findings 2024
- CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment — SUMEval 2025
- SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning — NAACL 2024
- SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages — EMNLP 2024
- MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders — ICASSP 2025
- CoinMath: Harnessing the Power of Coding Instruction for Math LLM — ACL Findings 2025
- Optimizing Cross-Modality Alignment Module for Audio Large Language Models — Data Intelligence 2025
- MNSC: Advancing Singlish Speech Understanding with Carefully Curated Corpora — ASRU 2025
- Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia — ACL 2025
- NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 — MLC-SLM 2025
- Diversity and Complementarity of Speech Encoders across Diverse Tasks in a Multi-modal Large Language Model — ASRU 2025
- Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems — arXiv 2025
- IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models — AACL 2025
- Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs — AACL 2025
- Train Multi-Modal LLMs to Understand Diverse Speech Paralinguistics by Distilling from Teachers with Meta-Information — AAAI 2026 Workshop on Audio-Centric AI
Students Supervised
- Pham The Binh Minh — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Multimodal AudioLLMs.
- Yiming Gao — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Instruction following capability for multimodal large language models. (AACL 2025)
- Tey Xue Cong — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
- Jayden Lum — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
- Yanchao Li — ACIS PhD Scholar, NTU, Singapore (2024-01 – 2025-04). Supervisor: Nancy F. Chen. Long video understanding.
- Ziyi Xu — Research Intern, NUS, Singapore (2024-07 – 2024-12). Supervisor: Sun Shuo. Multimodal alignment data collection and filtering.
- Ayrton San Joaquin — Research Associate, DesCarte@CREATE, Singapore (2023-09 – 2024-08). Efficient training of large language models through gradient estimation. (EMNLP 2024 Findings)
- Anh Thuc Nguyen — Research Intern, UNC Chapel Hill, USA (2024-01 – 2024-05). Question generation for MERaLiON project and evaluation dataset creation.
Academic Services
- Publication Chair: EMNLP 2023
- Local Organizing Team: EMNLP 2023
- Area Chair: ACL ARR (2024-2025)
- Editor: APSIPA Transactions on Signal and Information Processing
- Reviewer: ACL, EMNLP, NAACL, ICASSP, IEEE TASLP
Awards
- Best Paper Award ($300) — SUMEval Workshop, COLING 2025
- Best Paper Award ($200) — C3NLP Workshop, ACL 2024
Videos
- MERaLiON Introduction — Introduction to MERaLiON project. youtube.com/embed/nBA3MqwjN3I
- MERaLiON Demo — Demo of MERaLiON AudioLLM capabilities. youtube.com/embed/HZSa7vT73Lg
Talks
- 2025.03 — Lorong AI, Singapore. Evaluation on Audio-LLMs and Beyond. Slides