基于CLIP文本特征增强的剪纸图像分类

打印
收藏

收藏成功

微博 QQ空间微信

打开文本图片集

关键词：视觉语言大模型；剪纸分类；小样本分类；模态融合；提示学习中图分类号：TP391 文献标志码：A 文章编号：1001-3695（2025）07-010-1994-09 doi：10.19734/j.issn.1001-3695.2024.11.0485

Abstract：Toaddressthechallengesoflarge modalitygaps between textand image featuresand insuficient classprototype representationin paper-cut image clasification，this paper proposed a CLIP-based textfeature enhancement method（CLIP visualtextenhancer，C-VTE）.Themethdextractedtext featuresthrough manualprompttemplates，designedavisual-textenhancement module，andemployedCrosssAtentionand proportionalresidualconnections tofuseimageandtextfeatures，therebyreducing modalitydiscrepancyandenhancing the expressiveabilityofcategoryfeatures.Experimentsonapaper-cutdataset andfourpublicdatasets includingCaltech01validatedits efectivenessForbase-classclasificationonthepaper-cutdataset， C-VTE achieved 72.51% average accuracy，outperforming existing methods by 3.14 percentage points. In few-shot classification tasks on public datasets，it attained 84.78% average accuracy with a 2.45 percentage-point improvement.Ablation experimentsdemonstratethatboth themodalityfusion moduleand proportional residual components contribute significantlytoperformanceimprovement.Themethodofersnovelinsightsforeficientadaptationof vision-languagemodelsindownstreamclassification tasks，particularly suited for few-shot learning and base-class dominated scenarios.

Key words：visual language large model；paper-cut classification；few-shotclasification；multimodal fusion；prompt learning

0 引言

在非遗领域中，剪纸主要是以图片的形式存在，且种类复杂，数量繁多。（剩余22719字）

试读结束

购买全文6.00元下一篇基于完整超图神经网络的捆绑推荐模型

计算机应用研究

2025年07期

¥12.00/本