Summary: This project conducts a comprehensive gap analysis of existing digitization, natural language processing, and knowledge engineering capabilities for Sri Lankan Tamil, followed by the development of targeted NLP tools to address identified limitations. It delivers an end-to-end AI system that ingests, processes, and enriches digitized texts and documents to produce structured, reliable, and trustworthy knowledge. Through automated upstream pipelines including OCR, image processing, segmentation, and text normalization the project establishes a curated corpus that serves as the authoritative source of truth from Noolaham’s data.
This corpus is then analytically enriched using advanced natural language processing techniques to extract entities, topics, events, relationships, and summaries, transforming raw textual content into structured semantic and temporal representations. These outputs form the foundation for ontology development, knowledge graph construction, and a range of intelligent downstream applications supporting research, preservation, and access.
| Partner Institution | Noolaham Foundation |
| Supervisors | Dr. Saatviga Sudhahar |
| Researchers | Vaishanavi Shanmugam |