Infrastructure, storage and content conversion

Summary: This project focuses on establishing a scalable and resilient technical foundation to support large-scale digitization and processing of Noolaham’s collections. The project provisions secure compute infrastructure and object storage to manage raw inputs, intermediate artifacts, and processed outputs across the content lifecycle. On top of this infrastructure, a robust digital content conversion pipeline is deployed to systematically process Noolaham’s Tamil and English materials. The pipeline integrates OCR, image enhancement, layout analysis, segmentation, and text normalization workflows tailored to bilingual content, ensuring high-fidelity conversion from scanned sources to machine-readable text. The resulting outputs are versioned, traceable, and aligned with downstream NLP enrichment and knowledge engineering processes, enabling consistent reuse across search, analytics, and AI-driven applications.

Partner InstitutionNoolaham Foundation
SupervisorsDr. Saatviga Sudhahar
ResearchersSathiyalokeswaran Lingeswaran