Self-Supervised Representation Learning in Sparse Data Regimes
Keywords:
Self-Supervised Learning, Fraud Detection, Imbalanced Data, Anomaly Detection, Representation Learning.Abstract
Self-supervised representation learning enables scalable modeling by reducing reliance on labeled data. Yet fraud detection in financial systems remains challenged due to sparse, imbalanced observations and evolving transaction patterns, where existing methods fail to generalize and capture minority fraudulent behaviors effectively. This research aims to design a robust fraud detection framework for sparse and highly skewed financial datasets by leveraging self-supervised representations. Transactional data are aggregated from publicly available financial repositories and simulated streams reflecting realistic fraud scenarios, ensuring diversity in temporal and categorical attributes. Preprocessing incorporates normalization and missing value imputation. Feature extraction employs Time2Vec temporal encoding and rolling window statistical descriptors to capture evolving behavioral patterns. The proposed model integrates representation learning with Genetic Algorithm-tuned Dynamic Variational Autoencoders (GA-DVA), where data are first encoded into latent representations and subsequently refined for anomaly-aware discrimination. The Dynamic Variational Autoencoder models evolving transaction distributions, while the Genetic Algorithm optimizes latent space parameters and reconstruction constraints to enhance detection sensitivity under imbalanced datasets. This combination enables adaptive learning of rare fraud signatures. For robust fraud detection in sparse and imbalanced financial datasets, the framework prioritizes minority pattern amplification and distribution-aware learning using Python. Performance evaluation demonstrates improved precision (0.920), recall balance (0.912), F1-Score (0.916), AUC-ROC (0.950), Early Detection Rate (0.890), and reduced false alarms (0.050), and consistent adaptability to shifting data distributions. The approach delivers interpretable, scalable, and resilient fraud detection suitable for real-world financial environments.




