A Self-Supervised Semantic Representation Learning Framework with Adaptive Feature Fusion for Cross-Project Software Defect Prediction

Authors

  • Muhammad Zayaan Waqar Department of Computer Science, The University of Alabama at Birmingham, United States.
  • Armaghan Mubeen Butt Department of Computer Science, The University of Alabama at Birmingham, United States.
  • Muhammad Faheem Khan Department of Computer Science, TIMES University, Multan, 60000, Pakistan.

DOI:

https://doi.org/10.65492/01/401/2026/37

Keywords:

Software Vulnerability, Software Defects, Self-Supervised Learning, Defect Prediction, Semantic Representation Learning

Abstract

Cross-Project Software Defect Prediction (CPDP) has become a promising research direction for enhancing the software reliability when labeled defect samples are scarce or absent in real-world situations. Nevertheless, current CPDP models frequently confront the problems of domain shift, inconsistent feature distributions, and suboptimal generalizability of heterogeneous software projects. This paper presents a new Self-Supervised Semantic Representation Learning approach combined with Adaptive Feature Fusion and Cross-Project Domain Alignment methods to address the aforementioned issues in cross-project defect prediction. The proposed technique is composed of two parts: (i) self-supervised semantic learning using a Transformer-based contrastive learning model to generate reusable semantic embeddings of unlabeled software repositories; and (ii) adaptive feature fusion layer to combine these semantic embeddings with handcrafted software metrics. While the conventional CPDP framework primarily relies on handcrafted measures for learning semantic representations, our model leverages an invariant semantic relation among software entities via contrastive optimization. Moreover, cross-project feature alignment and transfer fine-tuning techniques have been applied to alleviate the problem of heterogeneous distribution in the latent space. We experimentally compared the proposed approach with current baseline models using the well-known AEEEM and PROMISE datasets. It was found that the proposed framework achieved state-of-the-art prediction accuracy, yielding Recall, F1-score, MCC, and AUC metrics of 0.93, 0.91, 0.81, and 0.96 respectively. Our study showed through extensive experiments and statistics analysis that incorporating self-supervised semantic learning, adaptive feature fusion, and cross-project domain alignment could significantly improve the prediction robustness and minimize false negative defects.

Published

2026-03-01

How to Cite

Muhammad Zayaan Waqar, Armaghan Mubeen Butt, & Muhammad Faheem Khan. (2026). A Self-Supervised Semantic Representation Learning Framework with Adaptive Feature Fusion for Cross-Project Software Defect Prediction. Machine Learning for Human Intelligence, 4(01), 50–83. https://doi.org/10.65492/01/401/2026/37