Homepage - Hejie Cui

Selected Publications (view all )

T²PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Haixin Wang, Hejie Cui^#, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun (^# corresponding author)

The International Conference on Machine Learning (ICML) 2026 Spotlight

Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse.

[Paper] [Code]

T²PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Haixin Wang, Hejie Cui^#, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun (^# corresponding author)

The International Conference on Machine Learning (ICML) 2026 Spotlight

[Paper] [Code]

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning

Weiqi Wang, Xin Liu, Binxuan Huang, Hejie Cui, Rongzhi Zhang, Changlong Yu, Shuowei Jin, Jingfeng Yang, Qingyu Yin, Zhengyang Wang, Zheng Li, Yifan Gao, Priyanka Nigam, Bing Yin, Lihong Li, Yangqiu Song

The Conference on Language Modeling (COLM) 2026

RLVR is now a standard way to train LLMs on reasoning tasks with verifiable outcomes, but when rollout generation dominates the cost, efficiency depends heavily on which prompts you sample and when. We introduce HeaPA, a query-side RLVR framework that combines difficulty-aware heap-based frontier sampling with on-policy query augmentation to improve math reasoning efficiency and accuracy.

Education

Experience

Honors & Awards

News

Selected Publications (view all )

T²PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

T²PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: Context Management with A Decoupled Long-Context Model

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records

TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records

A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Microstructures and Accuracy of Graph Recall by Large Language Models

Microstructures and Accuracy of Graph Recall by Large Language Models

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks

BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks

Brain Network Transformer

Brain Network Transformer

On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs

On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs

All publications