Summary: The project aims to develop a Retrieval-Augmented Generation (RAG)–based question-answering system using custom, domain-specific English content. The system will retrieve relevant passages from a processed, metadata-enriched text corpus and use them as grounding context for a GPT model to generate accurate and context-aware responses. By combining information retrieval with generative AI, the system is designed to improve factual accuracy, reduce hallucinations, and ensure responses reflect the terminology and structure of the underlying collection. Users will be able to interact with the system through natural-language queries, enabling intuitive access to large archival datasets.
The architecture is language-agnostic and can be customized for Tamil content with minimal modifications, primarily in the preprocessing, embedding, and retrieval components, allowing for future multilingual expansion.
| Partner Institution | Noolaham Foundation |
| Supervisors | Dr. Saatviga Sudhahar |
| Researchers | Nilakshan Kunananthaseelan |