大模型驱动的多模态智能感知小车控制方法研究

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:TP18 文献标识码:A

文章编号:2096-4706(2025)22-0179-06

Research on the Control Method of Large Model-driven Multimodal Intelligent Sensing Vehicle

LIUKe,CHENSiwei,KANGYilin,LANJiayi,LIUei (South-Central Minzu University,Wuhan 43oo74,China)

Abstract:The Raspberry Pi intellgent car control system supporting visual perception and Large Language Model driving providesaninteligent interaction control scheme foropen environments.The systemadopts a thre-layerarchitecture design,where the bottom execution layer takes Raspberry Pi5 asitscore, the top perception layer is constructed with cameras andmicrophones,andtheystem'sprocessinglaerisdeployedinthecloudintegratingproceingmodulessuchastefinetuned MiniCPMmodel,SenseVoicespeechrecognitionmodel,GroundingDINOzero-shotObjectDetectionmodeladDepth Anythingmonocular DepthEstimation model.Through theedge-cloudcolaboration mechanism,the systemcan decompose naturallanguage instructions into three subtasks including speechrecognition,semantic parsingandenvironmental pereption, andfinally generate specific motioncontrol instructions.Testresultsshowthatthesystemachieves highaccuracy inspeech recognitionandinstructionparsing,canfectivelyrecognizecomplexandvariablenaturalanguagecommands,andsuesfully breaks through the limitation that traditional embedded inteligent systems relyon fixed instruction sets.

Keywords: Large Language Model; speech recognition; Raspberry Pi; visual perception; Object Detection

0 引言

随着大语言模型、多模态模型和嵌入式系统技术的快速发展,基于大模型的具身智能在各类任务中取得了良好的效果,展现出强大的泛化能力与在各领域内广阔的应用前景[]。(剩余8497字)

目录
monitor