Building Noolaham Ontology

Summary: This project aims to create a high-level semantic framework that defines the canonical concepts, classes, properties, and relationships used across the Noolaham ecosystem. It provides a shared conceptual model against which extracted knowledge from the corpus is mapped, ensuring semantic consistency and structural coherence across digitization, NLP enrichment, knowledge graph construction, and application layers. ...

Building Noolaham knowledge graph

Summary: This project involves the creation of a structured, ontology-aligned graph that represents entities and their relationships derived from NLP-enriched corpus data. The knowledge graph models people, places, works, events, and concepts, enabling rich semantic linking across diverse materials in the Noolaham collection. By integrating temporal and contextual attributes, the graph supports advanced reasoning, inference, ...

Infrastructure, storage and content conversion

Summary: This project focuses on establishing a scalable and resilient technical foundation to support large-scale digitization and processing of Noolaham’s collections. The project provisions secure compute infrastructure and object storage to manage raw inputs, intermediate artifacts, and processed outputs across the content lifecycle. On top of this infrastructure, a robust digital content conversion pipeline is ...

Build and evaluate Noolaham GPT

Summary: This project focuses on the design, implementation, and assessment of an AI-powered conversational assistant that enables interactive question answering, exploration, and knowledge synthesis across the Noolaham ecosystem. Noolaham GPT leverages the authoritative Noolaham corpus, NLP-enriched structures, domain ontologies, and the underlying knowledge graph to deliver accurate, contextual, and explainable responses. The system is designed ...

Gap analysis with development of nlp tools and knowledge engineering

Summary: This project conducts a comprehensive gap analysis of existing digitization, natural language processing, and knowledge engineering capabilities for Sri Lankan Tamil, followed by the development of targeted NLP tools to address identified limitations. It delivers an end-to-end AI system that ingests, processes, and enriches digitized texts and documents to produce structured, reliable, and trustworthy ...

Custom GPT tool for question and answering

Summary: The project aims to develop a Retrieval-Augmented Generation (RAG)–based question-answering system using custom, domain-specific English content. The system will retrieve relevant passages from a processed, metadata-enriched text corpus and use them as grounding context for a GPT model to generate accurate and context-aware responses. By combining information retrieval with generative AI, the system is ...

Development of digital content conversion pipeline

Summary: The content conversion pipeline transforms Noolaham’s digitized documents into structured, machine-readable text. It begins with the ingestion of digital documents such as scanned newspapers, books, magazines, and pamphlets. These documents are first preprocessed to improve quality and consistency. The pipeline then performs layout analysis to identify document structure, followed by article segmentation to separate logical ...