A Self-Supervised Semantic Representation Learning Framework with Adaptive Feature Fusion for Cross-Project Software Defect Prediction

Authors

  • Muhammad Zayaan Waqar Department of Computer Science, The University of Alabama at Birmingham, United States.
  • Armaghan Mubeen Butt Department of Computer Science, The University of Alabama at Birmingham, United States.
  • Muhammad Faheem Khan Department of Computer Science, TIMES University, Multan, 60000, Pakistan.

DOI:

https://doi.org/10.65492/01/401/2026/37

Keywords:

Software Vulnerability, Software Defects, Self-Supervised Learning, Defect Prediction, Semantic Representation Learning

Abstract

Cross-Project Software Defect Prediction (CPDP) has become a promising research direction for enhancing software reliability when labeled defect samples are scarce or absent in real-world situations. Nevertheless, current CPDP models frequently encounter domain shift, inconsistent feature distributions, and suboptimal generalizability across heterogeneous software projects. This paper presents a new Self-Supervised Semantic Representation Learning approach that combines Adaptive Feature Fusion and Cross-Project Domain Alignment to address the aforementioned issues in cross-project defect prediction. The proposed technique is composed of two parts: (i) self-supervised semantic learning using a Transformer-based contrastive learning model to generate reusable semantic embeddings of unlabeled software repositories; and (ii) an adaptive feature fusion layer to combine these semantic embeddings with handcrafted software metrics. While the conventional CPDP framework primarily relies on handcrafted measures to learn semantic representations, our model leverages an invariant semantic relation among software entities through contrastive optimization. Moreover, cross-project feature alignment and transfer fine-tuning techniques have been applied to alleviate the problem of heterogeneous distribution in the latent space. We experimentally compared the proposed approach with current baseline models using the well-known AEEEM and PROMISE datasets. It was found that the proposed framework achieved state-of-the-art prediction accuracy, yielding Recall, F1-score, MCC, and AUC metrics of 0.93, 0.91, 0.81, and 0.96, respectively. Our study showed through extensive experiments and statistical analysis that incorporating self-supervised semantic learning, adaptive feature fusion, and cross-project domain alignment could significantly improve the prediction robustness and minimize false negative defects.

Downloads

Published

2026-03-01

How to Cite

Muhammad Zayaan Waqar, Armaghan Mubeen Butt, & Muhammad Faheem Khan. (2026). A Self-Supervised Semantic Representation Learning Framework with Adaptive Feature Fusion for Cross-Project Software Defect Prediction. Machine Learning for Human Intelligence, 4(01), 50–78. https://doi.org/10.65492/01/401/2026/37