Biography
I’m currently a researcher of the Tencent Hunyuan X team, focusing on RL and LLM. I hold a degree in Artificial Intelligence from Tsinghua University in 2025. My supervisor is Prof. Xiu Li, and I have received a lot of research guidance from my senior fellow student Jiafei Lyu. My main research focus on Reinforcement Learning, especially on LLM post-training, RL Exploration, Multi-Agent Reinforcement Learning, etc. I am proficient and interested in using mathematical theory to optimize LLM and RL methods.
If you are interested in connecting with me or believe there is potential for collaboration, please feel free to reach out via email at yangkaisigsrl@gmail.com.
News
2025.11 The Paper “EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control” is on Arxiv.
2025.10 The Paper “Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners” is on Arxiv.
2025.5 The Paper “CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning” is accepted by AAMAS 2026.
2024.11 The Paper “Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning” is accepted by AAAI 2025.
2024.6 The paper “A two-stage reinforcement learning-based approach for multi-entity task allocation” is accepted by Engineering Applications of Artificial Intelligence.
2024.5 The paper “Exploration and Anti-Exploration with Distributional Random Network Distillation” is accepted by ICML 2024.
2024.4 The work “BATON: Aligning Text-to-Audio Model with Human Preference Feedback” is accepted by IJCAI 2024.
2024.2 The paper “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is accepted by CVPR 2024.
2024.1 My work “Exploration and Anti-Exploration with Distributional Random Network Distillation” is on Arxiv.
2023.11 The paper “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is selected as HuggingFace daily paper.
2023.11 My work “Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model” is on Arxiv.
Publications
- Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model. Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Weihan Shen, Xiaolong Zhu, Xiu Li. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024. (Rank #2 of Huggingface Daily Paper).
- Exploration and Anti-Exploration with Distributional Random Network Distillation. Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li. International Conference on Machine Learning (ICML), 2024.
- EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control. Kai Yang, Xin Xu, Yangkun Chen, Weijie Liu, Jiafei Lyu, Zichuan Lin, Deheng Ye, Saiyong Yang. Arxiv.
- Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners. Xin Xu, Cliveb AI, Kai Yang, Tianhao Chen, Yang Wang, Saiyong Yang, Can Yang. Arxiv.
- BATON: Aligning Text-to-Audio Model with Human Preference Feedback. Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li. International Joint Conference on Artificial Intelligence (IJCAI), 2024.
- A two-stage reinforcement learning-based approach for multi-entity task allocation. Aicheng Gong, Kai Yang, Jiafei Lyu, Xiu Li. Engineering Applications of Artificial Intelligence (EAAI) 136, 108906.
- CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning. Zeyuan Liu, Kai Yang, Xiu Li. AAMAS 2026.
- Exploration by Random Distribution Distillation. Zhirui Fang, Kai Yang, Jian Tao, Jiafei Lyu, Lusong Li, Li Shen, Xiu Li. Arxiv.
- GTLMA: Generalizable Hierarchical Learning for Tasks with Variable Entities. Kai Yang, Aicheng Gong, Jian Tao, Yang Zhang, Xiu Li. 2023 International Conference on Frontiers of Robotics and Software Engineering (FRSE).
- CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings(Oral). Kai Yang, Zhirui Fang, Xiu Li, Jian Tao. 2024 International Joint Conference on Neural Networks (IJCNN), 1-8.
- Multi-agent Exploration with Sub-state Entropy Estimation. Jian Tao, Yangkun Chen, Yang Zhang, Kai Yang, Xiu Li. 2024 International Joint Conference on Neural Networks (IJCNN), 1-9.
Educations
2022-2025: Master’s Degree in Artificial Intelligence at Tsinghua Shenzhen International Graduate School, Tsinghua University.
2018-2022: Bachelor’s Degree in Automation from the Department of Electronic Information Engineering, Xi’an Jiaotong University.
Honors & Awards
2025.9 Outstanding Sharer Award of Tencent TEG.
2025.6 Outstanding Graduate Award of Tsinghua University. Top 1%, Rank 1st in Artificial Intelligence program.
2025.5 Outstanding Graduate Thesis of Tsinghua University. Top 3%.
2025.4 Second Prize in the Tsinghua University Professional Practice Competition. Top 5%.
2024.12 Academic Star Award 2nd prize of Tsinghua SIGS. Top 5%.
2024.10: National Scholarship of Tsinghua University. Top 1%.
2023.10: First class scholarship of Tsinghua Shenzhen International Graduate School. Top 10%.
2022.10: Outstanding Graduate of Xi’an Jiaotong University. Top 5%.
2021.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.
2021.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.
2020.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.
2020.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.
2019.10: School-level scholarship of Xi’an Jiaotong University. Top 5%.
2019.10: School-level outstanding student of Xi’an Jiaotong University. Top 10%.
Competitions
Math:
First Prize in the National Finals of the 13th Chinese Mathematics Competitions (CMC), ranking 7th nationwide.
Third prize in the National Finals of the 3rd Hua Jiao Cup Mathematics Competition.
Second Prize at the provincial level in the 12th Chinese Mathematics Competitions (CMC).
Computer Science:
1st place in the “Gorgewalk_v2” environment of the Tencent Honor of Kings AI Challenge.
Top 20 in the “hok_1v1” environment of the Tencent Honor of Kings AI Challenge.
Top 20 in the RLChina AI Challenge - RenYin Winter Season.
Honorable Mention Award in the 2021 Mathematical Contest in Modeling (MCM).
Work & Internships
Tencent TEG, LLM Department
- Investigated novel training strategies to enhance the stability and efficiency of large model post-training, and implemented life-long learning techniques for the reinforcement learning (RL) stage.
- Researched knowledge distillation from multiple specialized sub-models to elevate the performance ceiling across various model capabilities.
Internships: Tencent TEG, Machine Learning Platform Department
Reseach on efficient RLHF-based fine-tuning methods for improving LLM performance.
Internships: Parametrix Technology Company
Research on how to enhance AIGC performance using RLHF and developing baseline algorithms code for the Kaggle Lux competition.
Internship: Non-convex Technology Company
Developed a simulated stock trading environment by Gym and implemented a RL trading agent for high-frequency bond strategies.
Teaching
Teaching Assistant of the course Introduction to Reinforcement Learning instructed by Prof. Xiu Li, Spring 2024.
Other Achievements
- Champion of the XJTU “Starcup” Teamfight Tactics Tournament.
- Third Place in the Shaanxi Province Teamfight Tactics Challenge.
- Second Place in the University Town “Challenge Cup” League of Legends Competition.
