Integrating knowledge and data: a cross-modal feature fusion model for few-shot problems
-
Graphical Abstract
-
Abstract
The few-shot problem is a common phenomenon in machine learning, particularly in experimental science and medical research. Pure data-driven learning heavily relies on the quality and quantity of data. When data is scarce, the model is prone to overfitting and its generalization ability will decrease. However, most fields have accumulated rich experience and knowledge. A hybrid approach that combines domain knowledge with data can effectively improve the model performance. Nevertheless, in the context of few-shot problems, achieving effective cross-modal feature fusion of knowledge and data remains a significant challenge.
Therefore, the paper proposes a Knowledge and Data Cross-Modal Fusion Model (KDFM) for the few-shot problem. Firstly, numerical modal features are categorized into different feature types and modeled by using graphs. For each feature type, edge construction within the graphs is based on the K-means clustering. Then, different types of numerical features are processed through multi-channel graph convolution. The graphs convert numerical modal features into graph-level features, enhancing the expressiveness of numerical modalities. Subsequently, the domain knowledge features of semantic modalities are represented by a knowledge graph. Key entities and relationships are extracted from professional books and expert experience. The knowledge graph consists of triples which is the combination of entities and relationships. The knowledge graph can transform unstructured text features into graph-level features. Textual domain knowledge and experience are organized and conversed into the neural network model. Then, the paper employs a graph convolutional neural network and attention mechanisms for cross-modal feature fusion of knowledge and data. The input of the graph convolutional network includes the different graphs which is constructed by the numerical data, feature vectors obtained by the knowledge graph, and the numerical vectors from the data. Based on the number of feature types divided, multi-channel graph convolution is used for deep feature fusion of knowledge and data. The output is the fusion vector of multi-channel features, calculated using the attention mechanism. The final output can be used as the input feature vector for downstream tasks.
The model proposed in the paper is validated on two small sample datasets. One is a regression task in the materials field, the other is a classification task in the medical field. The simulation results show that, compared to other data-driven models, the proposed model KDFM exhibits excellent performance across various regression and classification task indicators. In the regression task, the model achieved the best results in MSE, MAE, and R2. Notably, the R2 surpasses the sub-optimal model MLP by over 7%. In the classification task, the model was optimal in five out of seven indicators, with the remaining two indicators being suboptimal. Additionally, multiple ablation experiments are conducted to verify the effectiveness of the model. By removing the modules of the knowledge graph and the graph convolutional network from the whole model, the paper validates the effectiveness of the knowledge modeling and cross-modal fusion parts of the proposed model. The proposed model has, to some extent, addressed the issues of weak generalization ability and the difficulty of integrating knowledge and data modalities for the few-shot problem.
-
-