The transformer model for sequence mining has brought a paradigmatic shift to many domains, including biological sequence mining. However, transformers suffer from quadratic complexity, i.e., O( l 2 ), where l is the sequence length, which affects the training and prediction time. Therefore, the work herein introduces a simple, generalized, and fast transformer architecture for improved protein function prediction. The proposed architecture uses a combination of CNN and global-average pooling to effectively shorten the protein sequences. The shortening process helps reduce the quadratic complexity of the transformer, resulting in the complexity of O(( l /2) 2 ). This architecture is utilized to develop PFP solution at the sub-sequence level. Furthermore, focal loss is employed to ensure balanced training for the hard-classified examples. The multi sub-sequence-based proposed solution utilizing an average-pooling layer (with stride = 2) produced improvements of +2.50 % (BP) and +3.00 % (MF) when compared to Global-ProtEnc Plus. The corresponding improvements when compared to the Lite-SeqCNN are: +4.50 % (BP) and +2.30 % (MF).
Contents
- Research Article
-
Requires Authentication UnlicensedA fast (CNN + MCWS-transformer) based architecture for protein function predictionLicensedJuly 1, 2025
- Corrigendum
-
Requires Authentication UnlicensedCorrigendum to: Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival dataLicensedNovember 14, 2025