I²R, A*STAR (2023 - 2025)
Position
Scientist (04/2023 - 04/2025), Institute for Infocomm Research (I²R), A*STAR, Singapore.
Tech Lead, MERaLiON Team - National Multimodal LLM Programme (NMLP), S$70 million grant from NRF
Research & Projects
Summary
I started my time at I²R, A*STAR focusing on dialogue summarization research. When the MERaLiON team was later formed to spearhead Singapore's national LLM project, I joined this elite group—honestly one of the sharpest teams at the institute, packed with top-tier PhDs and engineers. I took the lead on evaluation and data preparation for the AudioLLM workstream, focusing on making large models work across the diverse languages of Southeast Asia. It was a fast-paced journey that I wrapped up in 2025.
Research Topics
Dialogue Summarization
- How to effectively summarize multi-turn dialogues while preserving key information?
- What techniques can improve coherence and factual consistency in dialogue summaries?
Making LLMs hear - AudioLLM
- What techniques can be used to effectively integrate audio processing capabilities into existing LLM architectures?
- What is the most efficient approach for achieving seamless cross-modality integration?
- What benchmarks can be designed to accurately evaluate the real-world performance of AudioLLMs?
Videos
MERaLiON Introduction
Introduction to MERaLiON project.
MERaLiON Demo
Demo of MERaLiON AudioLLM capabilities.
Publications
- MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models (ACL 2025)
- AudioBench: A Universal Benchmark for Audio Large Language Models (NAACL 2025)
- Instructive Dialogue Summarization with Query Aggregations (EMNLP 2023)
- CRAFT: Extracting and Tuning Cultural Instructions from the Wild (C3NLP 2024)
- In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models (EMNLP Findings 2024)
- Resilience of Large Language Models for Noisy Instructions (EMNLP Findings 2024)
- CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment (SUMEval 2025)
- SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning (NAACL 2024)
- SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages (EMNLP 2024)
- MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders (ICASSP 2025)
- CoinMath: Harnessing the Power of Coding Instruction for Math LLM (ACL Findings 2025)
- Optimizing Cross-Modality Alignment Module for Audio Large Language Models (Data Intelligence 2025)
- MNSC: Advancing Singlish Speech Understanding with Carefully Curated Corpora (ASRU 2025)
- Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (ACL 2025)
- NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 (MLC-SLM 2025)
- Diversity and Complementarity of Speech Encoders across Diverse Tasks in a Multi-modal Large Language Model (ASRU 2025)
- Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems (arXiv 2025)
- IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models (AACL 2025)
- Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs (AACL 2025)
- Train Multi-Modal LLMs to Understand Diverse Speech Paralinguistics by Distilling from Teachers with Meta-Information (AAAI 2026 Workshop on Audio-Centric AI)
Awards
- Best Paper Award ($300)
SUMEval Workshop, COLING 2025 - Best Paper Award ($200)
C3NLP Workshop, ACL 2024
Talks
- 2025.03 Give a talk at Lorong AI, Singapore. Topic and slides: Evaluation on Audio-LLMs and Beyond
Academic Services
- Publication Chair: EMNLP 2023
- Local Organizing Team: EMNLP 2023
- Area Chair: ACL ARR (2024-2025)
- Editor: APSIPA Transactions on Signal and Information Processing
- Reviewer: ACL, EMNLP, NAACL, ICASSP, IEEE TASLP
Students
- Pham The Binh Minh, Undergraduate Research Intern, NTU, Singapore
2025-01 - 2025-05
Topic: Multimodal AudioLLMs. - Yiming Gao, Undergraduate Research Intern, NTU, Singapore
2025-01 - 2025-05
Topic: Instruction following capability for multimodal large language models.
Publication: AACL 2025 - Tey Xue Cong, A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore
Main Supervisor: Xunlong Zou
2025-02 - 2025-04
Topic: Multilingual speech data collection and processing. - Jayden Lum, A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore
Main Supervisor: Xunlong Zou
2025-02 - 2025-04
Topic: Multilingual speech data collection and processing. - Yanchao Li, ACIS PhD Scholar, NTU, Singapore
Main Supervisor: Nancy F. Chen
2024-01 - 2025-04
Topic: Long video understanding. - Ziyi Xu, Research Intern, NUS, Singapore
Main Supervisor: Sun Shuo
2024-07 - 2024-12
Topic: Multimodal alignment data collection and filtering. - Ayrton San Joaquin, Research Associate, DesCarte@CREATE, Singapore
2023-09 - 2024-08
Topic: Efficient training of large language models through gradient estimation.
Publication: EMNLP 2024 Findings - Anh Thuc Nguyen, Research Intern, UNC Chapel Hill, USA
2024-01 - 2024-05
Topic: Question generation for MERaLiON project and evaluation dataset creation.