Swinhyconmamba an explainable dualstream fusion framework with crossattention for kidney pathology classification| International Journal of Innovative Science and Research Technology

Swin-HyConMamba: An Explainable DualStream Fusion Framework with Cross-Attention for Kidney Pathology Classification

Authors : Sajid Ali; Yihong Zhang; Sajad Ul Haq; Ameer Hamza; Ran Yao Yao

Volume/Issue : Volume 11 - 2026, Issue 3 - March

Google Scholar : https://tinyurl.com/z3mthtkr

Scribd : https://tinyurl.com/55dt5wys

DOI : https://doi.org/10.38124/ijisrt/26mar1555

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Kidney disease is among the leading causes of morbidity worldwide, and early accurate diagnosis is critical for effective treatment. Medical imaging analysis has increasingly relied on convolutional neural networks (CNNs) and transformer-based models for disease identification. Transformers excel at global context representation but tend to lose fine-grained local detail, while CNNs are strong on local feature extraction but struggle with long-range dependencies. We propose Swin-HyConMamba, a dual-branch framework that combines the strengths of both. The Swin Transformer branch extracts hierarchical global contextual representations, while the HyConMamba branch handles local feature modeling and sequential dependency learning through convolutional and state-space operations. A cross-attention fusion module connects the two branches, enabling the model to attend to clinically relevant features while down-weighting background noise. We evaluate the model on the publicly available Kaggle kidney dataset, covering four classes: normal, cyst, stone, and tumour. The model achieves 99.84% classification accuracy, 99.9% micro-AUC-ROC, and 99.81% macro-average precision, recall, and F1-score, outperforming existing methods. Saliency maps and LIME are used to identify the image regions driving predictions, confirming that the model attends to pathologically relevant areas.

Keywords : Kidney Disease Classification, Swin Transformer, HyConMamba architecture, Cross-Attention Fusion, Explainable AI, Medical Imaging.

References :

KOVESDY C P. Epidemiology of chronic kidney disease: an update 2022 [J]. Kidney international supplements, 2022, 12(1): 7-11.
KUMAR P, BHATIA M. Role of CT in the pre-and postoperative assessment of conotruncal anomalies [J]. Radiology: Cardiothoracic Imaging, 2022, 4(3): e210089.
FRIEBE M. AI in radiology and interventions: a structured narrative review of workflow automation, accuracy, and efficiency gains of today and what’s coming [J]. International Journal of Computer Assisted Radiology and Surgery, 2025: 1-10.
KAUR R, JUNEJA M, MANDAL A K. Computer-aided diagnosis of renal lesions in CT images: a comprehensive survey and future prospects [J]. Computers & Electrical Engineering, 2019, 77: 423-34.
ZHANG M, YE Z, YUAN E, et al. Imaging-based deep learning in kidney diseases: recent progress and future prospects [J]. Insights into imaging, 2024, 15(1): 50.
THOMAS N R, ANITHA J, POPIRLAN C, et al. Next-Generation Deep Learning Approaches for Kidney Tumor Image Analysis: Challenges, Clinical Applications, and Future Perspectives [J]. Computers, Materials, & Continua, 2025, 85(3): 4407.
LIANG Y. Application of multi-scale dynamic enhancement based on deep neural network and CT urinary tract secretory phase image fusion in the diagnosis of urinary system diseases [J]. BMC Medical Imaging, 2025, 25(1): 1-21.
ZHANG K, WANG W, LV Z, et al. Computer vision detection of foreign objects in coal processing using attention CNN [J]. Engineering Applications of Artificial Intelligence, 2021, 102: 104242.
YIN Y, TANG Z, WENG H. Application of visual transformer in renal image analysis [J]. BioMedical Engineering OnLine, 2024, 23(1): 27.
SINGH D P, BANERJEE T, DURAI C A D, et al. A Comprehensive Study of Various Hybrid Deep Learning Models for Automated and Explainable Pneumonia Detection in the Pulmonary Alveolar Region: Current Insights and Future Directions [J]. Archives of Computational Methods in Engineering, 2025: 1-33.
ASIRI A A, SHAF A, ALI T, et al. Advancing brain tumor detection: harnessing the Swin Transformer’s power for accurate classification and performance analysis [J]. PeerJ Computer Science, 2024, 10: e1867.
WANG Y, MEI S, MA M, et al. HTACPE: A hybrid transformer with adaptive content and position embedding for sample learning efficiency of hyperspectral tracker [J]. IEEE Transactions on Multimedia, 2025, 27: 2384-98.
WU X, CAO Z-H, HUANG T-Z, et al. Fully-connected transformer for multi-source image fusion [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(3): 2071-88.
ISLAM M N, HASAN M, HOSSAIN M K, et al. Vision transformer and explainable transfer learning models for auto detection of kidney cyst, stone and tumor from CT-radiography [J]. Scientific Reports, 2022, 12(1): 11440.
SHAREN H, NARENDRA M, ANBARASI L J. MSKd_Net: Multi-Head Attention-Based Swin Transformer for Kidney Diseases Classification [J]. IEEE Access, 2024, 12: 181975-86.
MARTÍN Ó A, SÁNCHEZ J. Evaluation of Vision Transformers for Multi-Organ Tumor Classification Using MRI and CT Imaging [J]. Electronics, 2025, 14(15): 2976.
HUANG H, HUANG Y, DU X, et al. Contrastive Swin Transformer-Based Classification Model for Internet of Medical Things-Driven Kidney Stone Diagnosis [J]. IEEE Internet of Things Journal, 2025.
IQBAL S, QURESHI A N, ALHUSSEIN M, et al. A novel reciprocal domain adaptation neural network for enhanced diagnosis of chronic kidney disease [J]. Expert Systems, 2025, 42(2): e13825.
REHMAN A, MAHMOOD T, SABA T. Robust kidney carcinoma prognosis and characterization using Swin-ViT and DeepLabV3+ with multi-model transfer learning [J]. Applied Soft Computing, 2025, 170: 112518.
CONZE P-H, ANDRADE-MIRANDA G, LE MEUR Y, et al. Dual-task kidney MR segmentation with transformers in autosomal-dominant polycystic kidney disease [J]. Computerized Medical Imaging and Graphics, 2024, 113: 102349.
ELIAZER M, KUMAR G M, AMARAN S, et al. Advanced transformer with attention-based neural network framework for precise renal cell carcinoma detection using histological kidney images [J]. Scientific Reports, 2025, 15(1): 35345.
PAN W, LIU Y. CNN–Mamba–WOA: an efficient and explainable state-space fusion framework with multi-objective optimization for large-scale hemodialysis time-series prediction [J]. The Journal of Supercomputing, 2026, 82(1): 23.
LU F, XU J, SUN Q, et al. An Efficient Vision Mamba–Transformer Hybrid Architecture for Abdominal Multi-Organ Image Segmentation [J]. Sensors, 2025, 25(21): 6785.
RMR S S, MB S, R D, et al. A phase-aware Cross-Scale U-MAMba with uncertainty-aware segmentation and Switch Atrous Bifovea EfficientNetB7 classification of kidney lesion subtype [J]. Lasers in Medical Science, 2025, 40(1): 398.
QAMAR S, FAZIL M, AHMAD P, et al. UNet with self-adaptive Mamba-like attention and causal-resonance learning for medical image segmentation [J]. Scientific Reports, 2025.
ZHANG X, WANG X, NIU T. CT Image segmentation using frequency domain feature-assisted selective long memory state space model [J]. Sensing and Imaging, 2025, 26(1): 74.
SU C, LUO X, LI S, et al. VMKLA-UNet: vision Mamba with KAN linear attention U-Net [J]. Scientific Reports, 2025, 15(1): 13258.
SUN K, ZHOU J, WANG M, et al. S 2 Mamba: An Efficient Mamba Accelerator With Word-Importance SSM Sparsity [J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2026.
ZHOU R-G, HU W, FAN P, et al. Quantum realization of the bilinear interpolation method for NEQR [J]. Scientific Reports, 2017, 7(1): 2511.
DONG Y, YUE X, XU Z, et al. Correlation and Foreground Attention to Improve Object Detection; proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), F, 2023 [C]. IEEE.
LIU Y, LI H, HU C, et al. Learning to aggregate multi-scale context for instance segmentation in remote sensing images [J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 36(1): 595-609.

Kidney disease is among the leading causes of morbidity worldwide, and early accurate diagnosis is critical for effective treatment. Medical imaging analysis has increasingly relied on convolutional neural networks (CNNs) and transformer-based models for disease identification. Transformers excel at global context representation but tend to lose fine-grained local detail, while CNNs are strong on local feature extraction but struggle with long-range dependencies. We propose Swin-HyConMamba, a dual-branch framework that combines the strengths of both. The Swin Transformer branch extracts hierarchical global contextual representations, while the HyConMamba branch handles local feature modeling and sequential dependency learning through convolutional and state-space operations. A cross-attention fusion module connects the two branches, enabling the model to attend to clinically relevant features while down-weighting background noise. We evaluate the model on the publicly available Kaggle kidney dataset, covering four classes: normal, cyst, stone, and tumour. The model achieves 99.84% classification accuracy, 99.9% micro-AUC-ROC, and 99.81% macro-average precision, recall, and F1-score, outperforming existing methods. Saliency maps and LIME are used to identify the image regions driving predictions, confirming that the model attends to pathologically relevant areas.

Keywords : Kidney Disease Classification, Swin Transformer, HyConMamba architecture, Cross-Attention Fusion, Explainable AI, Medical Imaging.

Paper Submission Last Date
31 - May - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.