面向少样本的知识与数据跨模态特征融合模型

Integrating knowledge and data: a cross-modal feature fusion model for few-shot problems

  • 摘要: 少样本是机器学习中的常见现象,这在实验科学,医疗研究等领域尤为凸显。纯数据驱动式学习在数据稀缺的情况下容易出现过拟合和泛化能力下降等问题,而知识和数据的混合驱动方式能够有效提升模型的性能。目前少样本下知识与数据的跨模态融合仍存在一定的挑战。本文针对少样本学习问题,提出了一种深度跨模态特征融合模型(KDFM)以融合领域知识和结构化数据特征,进而提升下游任务的性能。该模型采用知识图谱表征领域内相关语义模态知识特征,用图网络结构表示数值模态特征,然后利用TransE提取图谱节点的特征向量,并采用多通道图卷积神经网络和注意力机制对知识和数据特征进行跨模态融合,最终基于融合的特征完成下游分类或预测任务。本文将所提模型分别在材料回归和医学分类少样本数据集上进行了验证,消融实验结果表明了所提模型的知识建模部分和跨模态融合部分的有效性。相比其他纯数据驱动的模型,所提模型在各项回归和分类任务上均取得了较好的结果,在一定程度解决了少样本下模型泛化能力弱,知识与数据模态融合困难的问题。

     

    Abstract: The few-shot problem is a common phenomenon in machine learning, particularly in experimental science and medical research. Pure data-driven learning heavily relies on the quality and quantity of data. When data is scarce, the model is prone to overfitting and its generalization ability will decrease. However, most fields have accumulated rich experience and knowledge. A hybrid approach that combines domain knowledge with data can effectively improve the model performance. Nevertheless, in the context of few-shot problems, achieving effective cross-modal feature fusion of knowledge and data remains a significant challenge. Therefore, the paper proposes a Knowledge and Data Cross-Modal Fusion Model (KDFM) for the few-shot problem. Firstly, numerical modal features are categorized into different feature types and modeled by using graphs. For each feature type, edge construction within the graphs is based on the K-means clustering. Then, different types of numerical features are processed through multi-channel graph convolution. The graphs convert numerical modal features into graph-level features, enhancing the expressiveness of numerical modalities. Subsequently, the domain knowledge features of semantic modalities are represented by a knowledge graph. Key entities and relationships are extracted from professional books and expert experience. The knowledge graph consists of triples which is the combination of entities and relationships. The knowledge graph can transform unstructured text features into graph-level features. Textual domain knowledge and experience are organized and conversed into the neural network model. Then, the paper employs a graph convolutional neural network and attention mechanisms for cross-modal feature fusion of knowledge and data. The input of the graph convolutional network includes the different graphs which is constructed by the numerical data, feature vectors obtained by the knowledge graph, and the numerical vectors from the data. Based on the number of feature types divided, multi-channel graph convolution is used for deep feature fusion of knowledge and data. The output is the fusion vector of multi-channel features, calculated using the attention mechanism. The final output can be used as the input feature vector for downstream tasks. The model proposed in the paper is validated on two small sample datasets. One is a regression task in the materials field, the other is a classification task in the medical field. The simulation results show that, compared to other data-driven models, the proposed model KDFM exhibits excellent performance across various regression and classification task indicators. In the regression task, the model achieved the best results in MSE, MAE, and R2. Notably, the R2 surpasses the sub-optimal model MLP by over 7%. In the classification task, the model was optimal in five out of seven indicators, with the remaining two indicators being suboptimal. Additionally, multiple ablation experiments are conducted to verify the effectiveness of the model. By removing the modules of the knowledge graph and the graph convolutional network from the whole model, the paper validates the effectiveness of the knowledge modeling and cross-modal fusion parts of the proposed model. The proposed model has, to some extent, addressed the issues of weak generalization ability and the difficulty of integrating knowledge and data modalities for the few-shot problem.

     

/

返回文章
返回