Guwahati, Sep 3 (PTI) Two Guwahati-based NGOs and IIT Bombay's AI initiative BharatGen have joined hands to include Assamese language in the database of artificial intelligence, an official said on Wednesday.
The three organisations signed an agreement on Tuesday for including two million pages with Assamese content into BharatGen, Assam Jatiya Bidyalay Educational and Socio-Economic Trust secretary Narayan Sharma said in a statement.
"For Assamese, long considered a 'low-resource' language in the digital ecosystem, this partnership is historic. With the inclusion of two million Assamese pages into BharatGen, the language has reached this scale of AI readiness for the first time," he said.
The association is the outcome of 'Digitising Assam', a community-driven project spearheaded by Nanda Talukdar Foundation (NTF), which, in 40 months, digitised and preserved more than two million pages of Assamese books, journals, manuscripts and ancient Sachipats.
"It is one of the largest citizen-led digital preservation efforts in the whole country," the statement said.
The BharatGen is the Centre's flagship AI initiative, led by IIT Bombay to build a sovereign, indigenous large model for Indian languages.
Its mission is to develop AI agents fluent in all of India's 22 scheduled languages, grounded in Indian cultural and linguistic data, and available as open-source resources.
BharatGen currently supports nine Indian languages -- Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu and Kannada. Assamese will be the 10th language on BharatGen.
Launched in June 2025, BharatGen is being developed as an alternative to global platforms like ChatGPT. Led by IIT Bombay with a consortium of other IITs, IIITs and leading institutions, BharatGen is the world's first government-funded multi-modal large language model (LLM) initiative. PTI TR TR BDC