Hi, I am a PhD student at HKU-NLP group, co-supervised by Prof. Lingpeng Kong and Prof. Qi Liu. Previously, I completed my master’s degree at Peking University as a member of LANCO, advised by Prof. Xu Sun and my bachelor’s degree at Xidian University.

My current research interests lie in developing multi-modal LLMs and understanding the emeging capabilities of LLMs.

I am happy to discuss potential collaboration opportunities, feel free to reach out!

News

  • [2025/01] Four paper got accepeted by ICLR 2025, see you in 🇸🇬!
  • [2025/01] ImgTrojan got accepted by NAACL 2025, congrats to Xijia and Chris!
  • [2024/10] Our FairEval is selected as the Most Influencial Paper of ACL 2024!👑👑
  • [2024/07] VITATECS for benchmarking temporal understanding of VideoLLMs, got accepted by ECCV 2024!
  • [2024/06] Checkout our Video-MME, the first-ever comprehensive benchmark for Video-LLMs.
  • [2024/02] Checkout our Reka Flash and Reka Core, first-tier mutlimodal LLMs!
  • [2023/12] :boom:Our paper Label Words are Anchors won the Best Long Paper Award of EMNLP 2023!

Education

  • PhD Student, The Univeristy of Hong Kong, Sept. 2023 - Now.
  • MSc in Computer Science, Peking University, Sept. 2020 - July 2023.
  • BE in Software Engineering, Xidian University Sept. 2016 - Jul. 2020.

Internship

  • Reka AI, Multi-modal LLM R&D Intern, July 2023 - Now
  • Shanghai AI Lab, Research Intern, July 2022 - June 2023.

    Mentor: Dr. Jingjing Xu

  • Toutiao Search, Search Algorithm Intern, Dec.2021 - June 2022.
  • Wechat AI, Research Intern, April 2020 - Nov. 2021.

    Mentor: Dr. Yankai Lin and Dr. Peng Li

Preprints

(#: Equal Contribution)

  • VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
    Lei Li #, Yuancheng Wei #, Zhihui Xie #, Xuqing Yang #, Yifan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujian Li, Bill Yuchen Lin, Lingpeng Kong, Qi Liu
    [arxiv, project page]

  • M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
    Lei Li, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu Sun, Lingpeng Kong, Qi Liu
    [arxiv, dataset]

Selected Publication

Multimodal LLMs

  • Temporal Reasoning Transfer from Text to Video
    Lei Li #, Yuanxin Liu #, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu
    ICLR 2025 [arxiv, project page]

  • VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
    Lei Li #, Zhihui Xie #, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong, Qi Liu
    EMNLP 2024 [project page]

  • VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
    Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou
    ECCV 2024 [arxiv, dataset]

  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
    Lei Li #, Yuqi Wang #, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu
    ACL 2024 [project page]

  • Can Language Models Understand Physical Concepts?
    Lei Li, Jingjing Xu, Qingxiu Dong, Ce Zheng, Qi Liu, Lingpeng Kong, Xu Sun
    EMNLP 2023 [arxiv, dataset]

Emerging Capabilities of LLMs

  • A Survey for In-context Learning
    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, Zhifang Sui
    EMNLP 2024 [arxiv, paper list]

  • Large Language Models are not Fair Evaluators
    Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Lingpeng Kong, Qi Liu, Tianyu Liu, Zhifang Sui
    ACL 2024 (Most Influencial Paper of ACL 2024)[arxiv, code]

  • Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
    Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun
    EMNLP 2023 (Best Long Paper Award) [arxiv, code]

Efficient Pre-trained Language Models

  • Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction
    Lei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun
    ECML-PKDD 2022 (Oral) [arxiv, code]

  • From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
    Lei Li, Yankai Lin, Xuancheng Ren, Guangxiang Zhao, Peng Li, Jie Zhou, Xu Sun
    Findings of EMNLP 2022 [url, code]

  • Dynamic Knowledge Distillation for Pre-trained Language Models
    Lei Li, Yankai Lin, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun
    EMNLP 2021 (Oral) [url, code]

  • CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
    Lei Li, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun
    Findings of EMNLP 2021 [url, code]

Knowledge-Enhanced NLP

  • Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge
    Yi Zhang #, Lei Li #, Yunfang Wu, Qi Su, Xu Sun
    IEEE Transactions on Audio, Speech and Language Processing (TASLP) [arxiv]

  • Enhancing Topic-to-essay Generation with External Commonsense Knowledge
    Pengcheng Yang #, Lei Li #, Fuli Luo, Tianyu Liu, Xu Sun
    ACL 2019, [url]

Academic Service

  • Area Chair/Action Editor: ACL ARR 2024
  • Reviewer/Program Committee: IJCV, ACM CSUR, NeuIPS 2024, COLM 2024, ICML (2024 - 2025), CVPR (2024 - 2025), ICLR (2024 - 2025), NeurIPS 2023 (Top Reviewer Award), ACL (2020 - 2023), EMNLP (2019 - 2023)
  • Teaching Assistant:
    • Machine Learning in Trading and Finance (2023 Fall, HKU)
    • Computational Linguistics (2021 Fall, PKU)

Invited Talks

Awards

  • Best Long Paper Award, EMNLP, 2023
  • National Scholarship, Peking University, 2021
  • National Scholarship, Xidian University, 2019
  • Best Demo Award, DeeCamp, 2018