Policy Gradient Method under

 Hadamard Parameterization and 

Its Extension


报告专家:魏轲 教授(复旦大学)

报告时间:7月17日(周四) 15:00-16:00

报告地点:数学学院西303

报告摘要:

Reinforcement learning (RL) is a sequential decision problem and has achieved great success in a variety of areas, for example robotics and large language models. In this talk, we will first discuss the policy gradient method based on the Hadamard parameterization, which is indeed a Riemannian gradient method on the unit sphere. The linear convergence of this method has been established. Then, we will present a general family of policy opitmization method with policy convergence guarantee.


专家简介:

魏轲,复旦大学大数据学院教授,博士生导师。2014年获得牛津大学博士学位,之后在香港科技大学(2014-2015)和加州大学戴维斯分校(2015-2017)从事博士后工作。目前主要研究兴趣包括高维信号与数据处理,强化学习算法与理论;研究成果发表于ACHA、SIAM系列、IEEE TIT、MP、JMLR等领域内权威期刊。



邀请人:唐庆粦


魏轲-01.jpg

Video:

  • Policy Gradient Method under Hadamard Parameterization and Its Extension
  • 15:00 - 16:00, 2025-07-17