Scientist · Tech Lead (Evaluation & Data), MERaLiON Team · Institute for Infocomm Research (I²R), A*STAR, Singapore

Research Focus

He started at I²R, A*STAR focusing on dialogue summarization research. When the MERaLiON team was later formed to spearhead Singapore's national LLM project, he joined it and led evaluation and data preparation for the AudioLLM workstream, focusing on making large models work across the diverse languages of Southeast Asia. He concluded this work in 2025.

He served as Tech Lead (Evaluation & Data) of the MERaLiON Team under the National Multimodal LLM Programme (NMLP), a S$70 million grant from NRF.

Research Topics

Dialogue Summarization
- How to effectively summarize multi-turn dialogues while preserving key information?
- What techniques can improve coherence and factual consistency in dialogue summaries?
Making LLMs hear — AudioLLM
- What techniques can be used to effectively integrate audio processing capabilities into existing LLM architectures?
- What is the most efficient approach for achieving seamless cross-modality integration?
- What benchmarks can be designed to accurately evaluate the real-world performance of AudioLLMs?

Publications

MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models — ACL 2025
AudioBench: A Universal Benchmark for Audio Large Language Models — NAACL 2025
Instructive Dialogue Summarization with Query Aggregations — EMNLP 2023
CRAFT: Extracting and Tuning Cultural Instructions from the Wild — C3NLP 2024
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models — EMNLP Findings 2024
Resilience of Large Language Models for Noisy Instructions — EMNLP Findings 2024
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment — SUMEval 2025
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning — NAACL 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages — EMNLP 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders — ICASSP 2025
CoinMath: Harnessing the Power of Coding Instruction for Math LLM — ACL Findings 2025
Optimizing Cross-Modality Alignment Module for Audio Large Language Models — Data Intelligence 2025
MNSC: Advancing Singlish Speech Understanding with Carefully Curated Corpora — ASRU 2025
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia — ACL 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 — MLC-SLM 2025
Diversity and Complementarity of Speech Encoders across Diverse Tasks in a Multi-modal Large Language Model — ASRU 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems — arXiv 2025
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models — AACL 2025
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs — AACL 2025
Train Multi-Modal LLMs to Understand Diverse Speech Paralinguistics by Distilling from Teachers with Meta-Information — AAAI 2026 Workshop on Audio-Centric AI

Students Supervised

Pham The Binh Minh — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Multimodal AudioLLMs.
Yiming Gao — Undergraduate Research Intern, NTU, Singapore (2025-01 – 2025-05). Instruction following capability for multimodal large language models. (AACL 2025)
Tey Xue Cong — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
Jayden Lum — A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore (2025-02 – 2025-04). Supervisor: Xunlong Zou. Multilingual speech data collection and processing.
Yanchao Li — ACIS PhD Scholar, NTU, Singapore (2024-01 – 2025-04). Supervisor: Nancy F. Chen. Long video understanding.
Ziyi Xu — Research Intern, NUS, Singapore (2024-07 – 2024-12). Supervisor: Sun Shuo. Multimodal alignment data collection and filtering.
Ayrton San Joaquin — Research Associate, DesCarte@CREATE, Singapore (2023-09 – 2024-08). Efficient training of large language models through gradient estimation. (EMNLP 2024 Findings)
Anh Thuc Nguyen — Research Intern, UNC Chapel Hill, USA (2024-01 – 2024-05). Question generation for MERaLiON project and evaluation dataset creation.

Academic Services

Publication Chair: EMNLP 2023
Local Organizing Team: EMNLP 2023
Area Chair: ACL ARR (2024–2025)
Editorial Board Member: APSIPA Transactions on Signal and Information Processing (2023–2025)
Reviewer: ACL, EMNLP, NAACL, ICASSP, IEEE TASLP

Awards

Best Paper Award — SUMEval Workshop, COLING 2025
Best Paper Award — C3NLP Workshop, ACL 2024

Videos

MERaLiON Introduction — Introduction to MERaLiON project. youtube.com/embed/nBA3MqwjN3I
MERaLiON Demo — Demo of MERaLiON AudioLLM capabilities. youtube.com/embed/HZSa7vT73Lg

Talks

2025.03 — Lorong AI, Singapore. Evaluation on Audio-LLMs and Beyond. Slides