Biography

I’m currently a researcher of the Tencent Hunyuan X team, focusing on RL and LLM. I hold a degree in Artificial Intelligence from Tsinghua University in 2025. My supervisor is Prof. Xiu Li, and I have received a lot of research guidance from my senior fellow student Jiafei Lyu. My main research focus on Reinforcement Learning, especially on LLM post-training, RL Exploration, Multi-Agent Reinforcement Learning, etc. I am proficient and interested in using mathematical theory to optimize LLM and RL methods.

If you are interested in connecting with me or believe there is potential for collaboration, please feel free to reach out via email at yangkaisigsrl@gmail.com.

News

  • 2025.11 The Paper “EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control” is on Arxiv.

  • 2025.10 The Paper “Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners” is on Arxiv.

  • 2025.5 The Paper “CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning” is accepted by AAMAS 2026.

  • 2024.11 The Paper “Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning” is accepted by AAAI 2025.

  • 2024.6 The paper “A two-stage reinforcement learning-based approach for multi-entity task allocation” is accepted by Engineering Applications of Artificial Intelligence.

  • 2024.5 The paper “Exploration and Anti-Exploration with Distributional Random Network Distillation” is accepted by ICML 2024.

  • 2024.4 The work “BATON: Aligning Text-to-Audio Model with Human Preference Feedback” is accepted by IJCAI 2024.

  • 2024.2 The paper “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is accepted by CVPR 2024.

  • 2024.1 My work “Exploration and Anti-Exploration with Distributional Random Network Distillation” is on Arxiv.

  • 2023.11 The paper “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is selected as HuggingFace daily paper.

  • 2023.11 My work “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is on Arxiv.

Publications

Educations

2022-2025: Master’s Degree in Artificial Intelligence at Tsinghua Shenzhen International Graduate School, Tsinghua University.

2018-2022: Bachelor’s Degree in Automation from the Department of Electronic Information Engineering, Xi’an Jiaotong University.

Honors & Awards

2025.9 Outstanding Sharer Award of Tencent TEG.

2025.6 Outstanding Graduate​ Award of Tsinghua University​​. Top 1%, Rank 1st in Artificial Intelligence program.

2025.5 ​​Outstanding Graduate Thesis of Tsinghua University​​. Top 3%.

2025.4 Second Prize in the Tsinghua University Professional Practice Competition. Top 5%.

2024.12 Academic Star Award 2nd prize of Tsinghua SIGS. Top 5%.

2024.10: National Scholarship of Tsinghua University. Top 1%.

2023.10: First class scholarship of Tsinghua Shenzhen International Graduate School. Top 10%.

2022.10: Outstanding Graduate of Xi’an Jiaotong University. Top 5%.

2021.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.

2021.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.

2020.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.

2020.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.

2019.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.

2019.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.

Competitions

Math:

  • First Prize in the National Finals of the 13th Chinese Mathematics Competitions (CMC), ranking 7th nationwide.

  • Third prize in the National Finals of the 3rd Hua Jiao Cup Mathematics Competition.

  • Second Prize at the provincial level in the 12th Chinese Mathematics Competitions (CMC).

Computer Science:

  • 1st place in the “Gorgewalk_v2” environment of the Tencent Honor of Kings AI Challenge.

  • Top 20 in the “hok_1v1” environment of the Tencent Honor of Kings AI Challenge.

  • Top 20 in the RLChina AI Challenge - RenYin Winter Season.

  • Honorable Mention Award in the 2021 Mathematical Contest in Modeling (MCM).

Work & Internships

Tencent TEG, LLM Department

  • Investigated novel training strategies to enhance the stability and efficiency of large model post-training, and implemented life-long learning techniques for the reinforcement learning (RL) stage.
  • Researched knowledge distillation from multiple specialized sub-models to elevate the performance ceiling across various model capabilities.

Internships: Tencent TEG, Machine Learning Platform Department

Reseach on efficient RLHF-based fine-tuning methods for improving LLM performance.

Internships: Parametrix Technology Company

Research on how to enhance AIGC performance using RLHF and developing baseline algorithms code for the Kaggle Lux competition.

Internship: Non-convex Technology Company

Developed a simulated stock trading environment by Gym and implemented a RL trading agent for high-frequency bond strategies.

Teaching

Teaching Assistant of the course Introduction to Reinforcement Learning instructed by Prof. Xiu Li, Spring 2024.

Other Achievements

  • Champion of the XJTU “Starcup” Teamfight Tactics Tournament.
  • Third Place in the Shaanxi Province Teamfight Tactics Challenge.​
  • Second Place in the University Town “Challenge Cup” League of Legends Competition.​