

Biography: Han Zhong is a Ph.D. student at Peking University. His research focuses on reinforcement learning and its connections to operations research, statistics, and optimization. He has published papers in leading journals and conferences, including Mathematics of Operations Research, Journal of the American Statistical Association, Journal of Machine Learning Research, ICML, NeurIPS, and ICLR.
Abstract: Designing efficient RL algorithms requires addressing two key dimensions. The first is statistical complexity — how many samples do we need to learn a good policy? We propose a unified framework called the Generalized Eluder Coefficient that captures the sample efficiency of both model-based and model-free RL under general function approximation. This framework also extends naturally to preference-based learning for aligning large language models, leading to practical algorithms like Iterative DPO and Self-Exploring LM. The second, less explored dimension is representation complexity — what should we learn? We show that approximating the model, policy, and value functions in RL has fundamentally different difficulty levels, forming a strict hierarchy rooted in circuit complexity theory. In particular, value functions are the hardest to represent, which explains why discriminative critics in PPO-style methods struggle in long-horizon LLM reasoning tasks. Motivated by this finding, we propose Generative Actor-Critic, which replaces the scalar critic with a generative critic that reasons step-by-step before assigning credit. Experiments show it is more scalable, more robust, and achieves better performance than both value-free methods like GRPO and traditional PPO.
| Date | Speaker | Title | Materials |
|---|---|---|---|
| Apr 16, 2026 | Yudong Zhang | AI-Integrated Colorectal Cancer Research: Challenges, Progress and Innovation | [Poster] |
| Apr 9, 2026 | Bin Gao | Low-rank Optimization Through the Lens of Geometry | [Poster] |
| Mar 31, 2026 | Zhuo Sun | Multilevel Control Functional | [Poster] |
| Mar 25, 2026 | Jiancheng Yang | Scaling Medical AI Without Scaling Cost in the Era of Generative AI | [Poster] |
| Mar 24, 2026 | Yang Cao | Differential Privacy in LLM Fine-Tuning: What It Protects, What It Costs, and What It Doesn’t | [Poster] |
| Mar 18, 2026 | Zijun Cui | AI + Knowledge: Unleashing the Power of Domain Knowledge for Advanced Artificial Intelligence | [Poster] |
| Mar 12, 2026 | Chenxi Yuan | Enhance Prediction of Alzheimer’s Disease with Generative AI | [Poster] |
| Mar 3, 2026 | Ren Wang | Robustness Through Collective Intelligence | [Poster] |
| Feb 28, 2026 | Huiyu Zhou | Constructing Masterpieces From Missing Pieces | [Poster] |
| Feb 10, 2026 | Guibo Luo | Benchmarking Multi-Party Privacy Computing and Exploring New Collaboration Paradigms | [Poster] |
| Feb 5, 2026 | Zeyu Zhang | Bridging Scene Understanding and Motion Generation in Robot Manipulation | [Poster] |
| Jan 28, 2026 | Md Sajid | Interpretable and Robust Randomized Neural Networks for Real-World Learning | [Poster] |
| Jan 21, 2026 | Bing Yang | Shape-Aware Deep Learning for AS-OCT Analysis: Segmentation and Structural Uncertainty | [Poster] |
| Jan 14, 2026 | Mengdi Zhao | Simulating Biological Intelligence: Bridging High-Fidelity Neuronal Modeling with Embodied Agents | [Poster] |
| Jan 7, 2026 | Xiangyu Chang | Research on Efficient and Fair Data Element Pricing Mechanisms | [Poster] |
| Dec 19, 2025 | Shujian Huang | Cross-lingual Knowledge Learning and Reasoning in Large Language Models | [Poster] |
| Dec 17, 2025 | Haotong Qin | Extreme Discretization: Towards Efficient Intelligence and Systems in the Scaling Era | [Poster] |
| Dec 3, 2025 | Ningning Ding | From Fair Unlearning Algorithms to Incentive-Compatible Mechanisms in Federated Unlearhing | [Poster] |
| Oct 23, 2025 | Raian Ali | AI Design vs. Human Attitude, Learning, and Dependency | [Poster] |
| Oct 22, 2025 | Yuwen Li | Higher Order Approximation Error Bounds for ReLU Neural Networks in Korobov Space | [Poster] |
| Oct 13, 2025 | Weijie Su | The ICML 2023 Ranking Experiment: Empirical Performance and Analysis of the Isotonic Mechanism | [Poster] |
| Sep 17, 2025 | Fanghui Liu | Bridging Theory and Practice: One-step Full Gradient Can Suffice for Low-rank Fine-tuning in LLMs | [Poster] |
| Aug 20, 2025 | Fan Yang | RNA Recognition and Targeted Degradation: Mechanisms and Engineering Strategies based on RNA-Binding Domains (RBDs) | [Poster] |
| Aug 4, 2025 | Tao Luo | The Theory of Parameter Condensation in Neural Networks | [Poster] |
| Jul 29, 2025 | Hangxin Liu | Embodied Mobile Manipulation: Trajectory Optimization vs. Diffusion | [Poster] |
| Jul 18, 2025 | Yide Liu | Constructing High-performance Robotic Insects With Origami Transmission Mechanism | [Poster] |
| Jul 15, 2025 | Hai Dong | Mobile Edge Intelligence: When AI Meets Mobile Edge Computing | [Poster] |
| Jun 30, 2025 | Guangyi Chen | Causal Representation Learning for Visual Understanding | [Poster] |