Authors :
Shalini Priya; Ashish Dubey
Volume/Issue :
Volume 10 - 2025, Issue 7 - July
Google Scholar :
https://tinyurl.com/3zdexb3b
Scribd :
https://tinyurl.com/rwvhmthu
DOI :
https://doi.org/10.38124/ijisrt/25jul651
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
This paper presents a parameterised FIFO architecture optimized for eliminat- ing idle bubble cycles in GPU data
pipelines. The design focuses on addressing classical inefficiencies in FIFO-based data transfers, especially under boundary
conditions such as full and empty queue states. The proposed architecture en- sures concurrent read and write transactions
without causing data corruption or latency penalties.
The FIFO logic was implemented using synthesizer-portable SystemVerilog and verified using a Universal Verification
Methodology (UVM) testbench. The design achieves 100% functional coverage across over 10 million simulation
transactions. ASIC synthesis was carried out using a 28 nm low-power CMOS process, and the results show a 12% reduction
in silicon area and 18% savings in dynamic power compared to traditional synchronous FIFO IPs.
The architecture is scalable in both depth and width, and it can be inte- grated directly into high-performance GPU
shader pipelines. This work offers a viable solution for improving data-path efficiency in compute-intensive system- on-chip
(SoC) designs.
References :
- C. E. Cummings, “Simulation and synthesis techniques for asynchronous fifo design,” Sunburst Design, 2002.
- R. Chakraborty and A. Sen, “Design and performance analysis of asyn- chronous fifo for soc applications,” International Journal of VLSI Design & Communication Systems, vol. 5, no. 1, 2014.
- M. Hassan et al., “High-throughput fifos for gpgpu socs,” IEEE Transac- tions on VLSI Systems, vol. 19, no. 3, pp. 442–452, 2011.
- S. Narang, “Elastic buffers in deep learning accelerators,” in Proceedings of the Design Automation Conference (DAC), 2021.
- M. Kalem and O. Ozturk, “Bypass fifos for efficient gpu interconnects,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 19, no. 4, 2022.
- Xilinx, UltraRAM FIFO v2.0 Product Guide, 2023. PG269 (v2.0), Xilinx Inc.
- N. Corp, “Pipeline buffer credit counter,” 2017. United States Patent.
- S. Keckler, B. Khailany, and M. Garland, “Gpus and the future of parallel computing,” IEEE Micro, vol. 31, no. 5, pp. 7–17, 2011.
- A.Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” IEEE Int’l Symp. on Per- formance Analysis of Systems and Software (ISPASS), pp. 163–174, 2009.
- AMD Inc., “Rdna 3 shader core optimization whitepaper.” https:// www.amd.com/en/technologies/rdna, 2022. Accessed: 2025-06-28.
This paper presents a parameterised FIFO architecture optimized for eliminat- ing idle bubble cycles in GPU data
pipelines. The design focuses on addressing classical inefficiencies in FIFO-based data transfers, especially under boundary
conditions such as full and empty queue states. The proposed architecture en- sures concurrent read and write transactions
without causing data corruption or latency penalties.
The FIFO logic was implemented using synthesizer-portable SystemVerilog and verified using a Universal Verification
Methodology (UVM) testbench. The design achieves 100% functional coverage across over 10 million simulation
transactions. ASIC synthesis was carried out using a 28 nm low-power CMOS process, and the results show a 12% reduction
in silicon area and 18% savings in dynamic power compared to traditional synchronous FIFO IPs.
The architecture is scalable in both depth and width, and it can be inte- grated directly into high-performance GPU
shader pipelines. This work offers a viable solution for improving data-path efficiency in compute-intensive system- on-chip
(SoC) designs.