基于渐近式k-means聚类的多行动者确定性策略梯度算法

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

中图分类号：TP18 文献标志码：A 文章编号：1671-5489（2025）03-0885-10

Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k -Means Clustering

LIU Quan 1，2 ，LIU Xiaosong²，WU Guangjun²，LIU Yuhan³ （1.SchoolofComputerScienceand Techology，Kashi Unersity，Kashi844O，XinjiangUygurAutonomousRegion，China; 2. School of Computer Science and Technology，Soochow University，Suzhou 215008，Jiangsu Province，China; 3. Academyof Future Education，Xi'an Jiaotong-Liverpool University， Suzhou 2150oo，Jiangsu Province，China）

Abstract： Aiming at the problems of poor learning performance and high fluctuation in the deep deterministic policy gradient （DDPG） algorithm for tasks with some large state spaces，we proposed a multi-actor deep deterministic policy gradient algorithm based on progressive k -means clustering （MDDPG-PK-Means） algorithm. In the training process，when selecting actions for the state at each time step，the decision-making of the actor network was assisted based on the discrimination results of the k -means clustering algorithm. At the same time，as the training steps increased，the number of （204 k -means cluster centers gradually increased. The MDDPG-PK-Means algorithm was applied to the MuJoCo simulation platform，the experimental results show that，compared with DDPG and other algorithms，the MDDPG-PK-Means algorithm has better performance in most continuous tasks.

Keywords： deep reinforcement learning；deterministic policy gradient algorithm; k -means clustering; multi-actor

强化学习（reinforcement learning，RL）是一种在环境中不断自主学习，寻找规律以最大化未来累计奖赏，从而寻找最优策略达到目标的方法[1]．其根据Agent 的当前状态寻找可执行的动作，因此强化学习适合解决序贯决策问题[2-3].

在传统强化学习中，基于值函数的 SARSA（state-action-reward-state-action）和 Q -Learning[4-5]算法在经典强化学习任务，如Cart-Pole和Mountain-Car等低维状态空间环境中效果较好，但在高维动作空间环境中性能不佳．随着深度学习的发展，深度神经网络有高效识别高维数据的能力，因此将深度学习（deep learning，DL）与强化学习相结合的深度强化学习（deep reinforcement learning，DRL）[6]能解决高维动作空间问题，目前 DRL已成为人工智能领域的热门研究方向之—[7-8]。（剩余13831字）

试读结束

购买全文6.00元下一篇受磁场调控的黏弹性流体在双层系统中的热不稳定性

吉林大学学报（理学版）

2025年03期

¥10.00/本