About Me

I am a 4th year CS Ph.D. candidate at the LAUNCH Lab, University of Michigan – Ann Arbor, advised by Prof. Lu Wang. I obtained my Bachelor’s degree from Peking University, advised by Prof. Xiaojun Wan.

I work on LLM reasoning and agents.

News

  • 2025-11: Thrilled to receive NeurIPS Travel Award!
  • 2025-10: Check out our recent work, ThinkLogit, on efficient long chain-of-thought reasoning.
  • 2025-09: MLRC-Bench, which focuses on the evaluation of AI research agents, has been accepted to NeurIPS 2025 Datasets and Benchmarks Track. See you in San Diego!
  • 2025-05: Excited to start an internship at AWS Agentic AI, working on machine learning engineering (MLE) agent.

Selected Publications

  • Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
    Yunxiang Zhang, Muhammad Khalifa, Lechen Zhang, Xin Liu, Ayoung Lee, Xinliang Frederick Zhang, Farima Fatahi Bayat, Lu Wang
    The 1st Workshop on Test-time Scaling and Reasoning Models (SCALR @ COLM 2025) [paper] [code]
  • MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
    Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan, Grant D Murphy, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang
    NeurIPS 2025 Datasets and Benchmarks Track [paper] [project page] [code]
  • Small Language Models Need Strong Verifiers to Self-Correct Reasoning
    Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang
    Findings of ACL 2024 [paper] [code] [project page]
  • Merging Generated and Retrieved Knowledge for Open-Domain QA
    Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang
    EMNLP 2023 [paper] [code]
  • SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning
    Yunxiang Zhang, Xiaojun Wan
    NeurIPS 2023 Datasets and Benchmarks Track [paper] [data]
  • MOVER: Mask, Over-generate and Rank for Hyperbole Generation
    Yunxiang Zhang, Xiaojun Wan
    NAACL 2022 [paper] [code]
  • Interpreting the Robustness of Neural NLP Models to Textual Perturbations
    Yunxiang Zhang, Liangming Pan, Samson Tan, Min-Yen Kan
    Findings of ACL 2022 [paper]
  • BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
    Yunxiang Zhang, Xiaojun Wan
    AAAI 2022 [paper] [data]

Services

Co-Organizer, 1st Workshop on Test-Time Scaling and Reasoning Models, COLM 2025

Student Volunteer, ACL 2024

Reviewer: ICML (2025), AISTATS (2025), ICLR (2025), NeurIPS (2024-25), ACL Rolling Review (2024-25), COLM (2024), EMNLP (2022-24), ACL (2023), CoNLL (2023-24), ACM Computing Surveys