I²R, A*STAR

Position

Scientist (04/2023 - 04/2025), Institute for Infocomm Research (I²R), A*STAR, Singapore.

Tech Lead, MERaLiON Team - National Multimodal LLM Programme (NMLP), S$70 million grant from NRF

Research & Projects

Summary

I started my time at I²R, A*STAR focusing on dialogue summarization research. When the MERaLiON team was later formed to spearhead Singapore's national LLM project, I joined this elite group—honestly one of the sharpest teams at the institute, packed with top-tier PhDs and engineers. I took the lead on evaluation and data preparation for the AudioLLM workstream, focusing on making large models work across the diverse languages of Southeast Asia. It was a fast-paced journey that I wrapped up in 2025.

Research Topics

Dialogue Summarization

How to effectively summarize multi-turn dialogues while preserving key information?
What techniques can improve coherence and factual consistency in dialogue summaries?

Making LLMs hear - AudioLLM

What techniques can be used to effectively integrate audio processing capabilities into existing LLM architectures?
What is the most efficient approach for achieving seamless cross-modality integration?
What benchmarks can be designed to accurately evaluate the real-world performance of AudioLLMs?

Videos

MERaLiON Introduction

Introduction to MERaLiON project.

MERaLiON Demo

Demo of MERaLiON AudioLLM capabilities.

Publications

MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models (ACL 2025)
AudioBench: A Universal Benchmark for Audio Large Language Models (NAACL 2025)
Instructive Dialogue Summarization with Query Aggregations (EMNLP 2023)
CRAFT: Extracting and Tuning Cultural Instructions from the Wild (C3NLP 2024)
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models (EMNLP Findings 2024)
Resilience of Large Language Models for Noisy Instructions (EMNLP Findings 2024)
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment (SUMEval 2025)
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning (NAACL 2024)
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages (EMNLP 2024)
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders (ICASSP 2025)
CoinMath: Harnessing the Power of Coding Instruction for Math LLM (ACL Findings 2025)
Optimizing Cross-Modality Alignment Module for Audio Large Language Models (Data Intelligence 2025)
MNSC: Advancing Singlish Speech Understanding with Carefully Curated Corpora (ASRU 2025)
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (ACL 2025)
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 (MLC-SLM 2025)
Diversity and Complementarity of Speech Encoders across Diverse Tasks in a Multi-modal Large Language Model (ASRU 2025)
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems (arXiv 2025)
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models (AACL 2025)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs (AACL 2025)
Train Multi-Modal LLMs to Understand Diverse Speech Paralinguistics by Distilling from Teachers with Meta-Information (AAAI 2026 Workshop on Audio-Centric AI)

Awards

Best Paper Award ($300)
SUMEval Workshop, COLING 2025
Best Paper Award ($200)
C3NLP Workshop, ACL 2024

Talks

2025.03 Give a talk at Lorong AI, Singapore. Topic and slides: Evaluation on Audio-LLMs and Beyond

Academic Services

Publication Chair: EMNLP 2023
Local Organizing Team: EMNLP 2023
Area Chair: ACL ARR (2024-2025)
Editor: APSIPA Transactions on Signal and Information Processing
Reviewer: ACL, EMNLP, NAACL, ICASSP, IEEE TASLP

Students

Pham The Binh Minh, Undergraduate Research Intern, NTU, Singapore
2025-01 - 2025-05
Topic: Multimodal AudioLLMs.
Yiming Gao, Undergraduate Research Intern, NTU, Singapore
2025-01 - 2025-05
Topic: Instruction following capability for multimodal large language models.
Publication: AACL 2025
Tey Xue Cong, A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore
Main Supervisor: Xunlong Zou
2025-02 - 2025-04
Topic: Multilingual speech data collection and processing.
Jayden Lum, A*STAR Scholar Intern, Ngee Ann Polytechnic, Singapore
Main Supervisor: Xunlong Zou
2025-02 - 2025-04
Topic: Multilingual speech data collection and processing.
Yanchao Li, ACIS PhD Scholar, NTU, Singapore
Main Supervisor: Nancy F. Chen
2024-01 - 2025-04
Topic: Long video understanding.
Ziyi Xu, Research Intern, NUS, Singapore
Main Supervisor: Sun Shuo
2024-07 - 2024-12
Topic: Multimodal alignment data collection and filtering.
Ayrton San Joaquin, Research Associate, DesCarte@CREATE, Singapore
2023-09 - 2024-08
Topic: Efficient training of large language models through gradient estimation.
Publication: EMNLP 2024 Findings
Anh Thuc Nguyen, Research Intern, UNC Chapel Hill, USA
2024-01 - 2024-05
Topic: Question generation for MERaLiON project and evaluation dataset creation.