Authors :
Echeonwu, Emmanuel Chinyere; Bolou, Dickson Bolou; Omonijo, Oluwaseyi Oluwatola; Ugbogbo, Mike Johnon; Omejieke, Chinenye Ekene
Volume/Issue :
Volume 10 - 2025, Issue 12 - December
Google Scholar :
https://tinyurl.com/4xms2y6v
Scribd :
https://tinyurl.com/2s4ft6t7
DOI :
https://doi.org/10.38124/ijisrt/25dec1333
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This study presents a novel Retrieval Augmented Generation (RAG), a text-based query system, for efficient access
to Nigerian legal information. Utilizing the Nigerian Constitution and Criminal Code as its knowledge base, the system
employs a pipeline involving semantic segmentation, Sentence Transformer embeddings, and vector database indexing for
optimized information retrieval. User queries are refined by a Google Gemini large language model, trained as a Nigerian
legal expert, to identify key terms and intent before searching the database for the top ten most relevant document chunks.
These chunks, along with the refined query and keywords, are then fed back into Gemini to generate a detailed, referenced
answer. The current implementation is evaluated using the precision. Recall, F1Score, perplexity and diversity metrics, and
results fall within acceptable benchmarks of mean values (0.65, 0.73, 0.68, 14.42, 0.87) respectively, representing a significant
advancement in making complex legal big data accessible.
Keywords :
Retrieval Augmented Generation2, Embeddings, Bigdata, Vector Database, Large Language Model.
References :
- Gwangndi, M., I. (2016). The Socio-Legal Context of the Nigerian Legal System and the Shariah Controversy: An Analysis of Its Impact on Some Aspects of Nigerian Women’S Rights. Journal of Law, Policy and Globalization. 45: 2224-3240.
- Han, Y., Liu, C., and Wang, P., (2023). A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. https://arxiv.org/pdf/2310.11703.
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.ArXiv, abs/2005.11401.
- Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L., (2020). On the Sentence Embeddings from Pre-trained Language Models. Art. no. arXiv:2011.05864, 2020. doi:10.48550/arXiv.2011.05864.
- Omri K., Adir C., Noam, M., Rotman, M., and Berant, J. (2018).Text Segmentation as a Supervised Learning Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). pages 469–473, New Orleans, Louisiana. Association for Computational Linguistics.
- Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y. and Miller, A. (2019). Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pages 2463–2473. Association for Computational Linguistics. doi: 10.18653/v1/D19-1250. URL https://www.aclweb.org/anthology/D19-1250.
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints. URL https://arxiv.org/abs/1910.10683.
- Reimers, N. and Gurevych, I., (2009). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Art. no. arXiv:1908.10084, doi:10.48550/arXiv.1908.10084.
- Roberts, A., Raffel, C., and Shazeer, N. (2020). How much knowledge can you pack into the parameters of a language model? arXiv e-prints. URL https://arxiv.org/abs/ 2002.08910.
- United Nation. (2006). Social Justice in an open world. Publication by the Department of Economic and Social Affairs. International Forum for Social Development. https://www.un.org/esa/socdev/documents/ifsd/SocialJustice.pdf
- Vijit M., Rishabh, S., Kumar G., Shubham K, M.,, Angshuman H., Arnab B., Ashutosh, M. (2021). Semantic Segmentation of Legal Documents via Rhetorical Roles. Art. no. arXiv:2112.01836, doi:10.48550/arXiv.2112.01836.
This study presents a novel Retrieval Augmented Generation (RAG), a text-based query system, for efficient access
to Nigerian legal information. Utilizing the Nigerian Constitution and Criminal Code as its knowledge base, the system
employs a pipeline involving semantic segmentation, Sentence Transformer embeddings, and vector database indexing for
optimized information retrieval. User queries are refined by a Google Gemini large language model, trained as a Nigerian
legal expert, to identify key terms and intent before searching the database for the top ten most relevant document chunks.
These chunks, along with the refined query and keywords, are then fed back into Gemini to generate a detailed, referenced
answer. The current implementation is evaluated using the precision. Recall, F1Score, perplexity and diversity metrics, and
results fall within acceptable benchmarks of mean values (0.65, 0.73, 0.68, 14.42, 0.87) respectively, representing a significant
advancement in making complex legal big data accessible.
Keywords :
Retrieval Augmented Generation2, Embeddings, Bigdata, Vector Database, Large Language Model.