基于约束型TD3的动态探索噪声改进算法

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TP181;TP301.6;TP242 文献标识码:A 文章编号:2096-4706(2025)07-0103-06

Abstract: Aiming atthe problem that unconstrained exploration maycause damage to the mobile car,thisstudy proposes a ReinforcementLearning methodthatcombinesadaptive noiseexplorationandLagrangemultiplierconstraints,aiming tooptimize thetrajectoryplaningofthecarreachingthe targetpoint.Thismethodimprovestheexplorationefciencybydynamically adjusting the noise,uses the TD3algorithmtodeal with thecontinuousaction space,and uses the Lagrange multiplier method to deal withtheconstraints,whichis diferentfromthe wayofaddingthepenaltyofunexpectedbehaviordirectlyintheMarkov Decision Process(MDP).Simulation experiments show that this methodcan effectively guidethecar to avoid obstacles,educe theviolationofconstraints,andensurethesafetyandreliabilityofthetask,showinggoodtrainingconvergencecharacteristics.

Keywords: SafetyReinforcementLearning; ConstrainedMarkovDecision Proces;trajectoryplanning;TD3algorithm

0 引言

随着自动化技术的飞速发展,机器人技术已在工业制造、服务业等众多领域得以广泛应用[1],成为提升作业效率与操作精确度的关键要素。(剩余8693字)

monitor