Project 2395

Project Info

Indigenous-AI thumbnail

Team Name


Team Members


Project Description

Indigenous Storyteller is a unique project that taps into the power of Open Data and Generative AI to create informative, engaging, and immersive stories based on indigenous cultures. Focusing on Australian indigenous languages, the project uses an open-source dataset provided by the Queensland government, which contains a wealth of information on the local indigenous tongues.

Currently, AI has limited comprehension of these languages due to scarcity of data. This project aims to bridge this gap by providing AI with the necessary data to understand and interpret these languages effectively.

The data input to the system includes language name, pronunciation, common words, geographical locations, images, and other related information. While AI may not produce informative contents independently, this additional data allows it to weave fascinating narratives based on indigenous culture, highlighting the uniqueness and diversity of different tribes' languages and cultures.

#gnerative ai #llm #indigenous language #storyteller

Data Story

Unleashing the Power of Open Data with Generative AI


In this project, our aim was to leverage the power of generative AI and open data to understand and generate content related to indigenous languages in Queensland, Australia. The primary challenge we encountered was the limited amount of available data for these languages, making it difficult for AI systems to comprehend and work with them effectively. To overcome this obstacle, we utilized an open source dataset obtained from the Queensland government's website, specifically the interactive Indigenous languages map.

Dataset Description

The Indigenous languages map dataset offers a comprehensive snapshot of indigenous languages in Queensland. It encompasses over 150 Aboriginal and Torres Strait Islander language groups, providing valuable details such as where these languages were spoken, dialects, sample words, and relevant images from the State Library's collection. The dataset was procured as part of the State Library's Indigenous Languages Project, which aims to raise awareness of the linguistic diversity in Queensland and support language research and community language revival efforts.

Applying AI to Indigenous Languages

Due to the scarcity of data for indigenous languages, AI systems have traditionally struggled to comprehend and generate content accurately. However, by integrating this open data into our project, we sought to empower AI algorithms to better understand and work with these languages.

We developed a generative AI system that utilized the Queensland government's Indigenous languages map dataset as an external data source. By training our AI model on this dataset, it gained a deeper understanding of the indigenous languages, enabling it to generate content relevant to these languages with greater accuracy and context.

Potential Impact and Future Directions

Our project's utilization of open data and innovative AI techniques has significant potential impact. Firstly, it enhances our understanding and appreciation of the linguistic diversity and rich cultural heritage of indigenous languages in Queensland. Secondly, it creates opportunities for language research and community-driven language revival initiatives.

By providing AI systems with an extensive dataset of indigenous language information, we hope to contribute to the documentation, preservation, and revitalization of these traditional languages. Furthermore, this project serves as a stepping stone towards AI-driven initiatives that respectfully work alongside indigenous communities, acknowledging their role as the true custodians of language heritage and knowledge.


The project successfully combined generative AI and open data to address the challenge of limited data availability for indigenous languages in Queensland. By leveraging the open source Indigenous languages map dataset, we enabled AI models to better grasp and generate content related to these languages. Our goal is to contribute to language heritage preservation, revival, and community-driven research. This project emphasizes the importance of recognizing and respecting the Traditional Owners, Elders, language custodians, and community members who hold the core ownership of language knowledge.

Team Members

  • Junchen You
  • OpenAI August 19 version (GPT-3.5 & GPT-4)

Evidence of Work



Project Image

Team DataSets

State Library of Queensland - Indigenous languages map data

Description of Use Through the integration of the dataset into the project, the AI algorithms can access and analyze the extensive information provided, enabling them to better understand and generate content related to indigenous languages. This utilization of open data empowers AI to contribute to language heritage, documentation, preservation, and potentially support the revitalization of traditional languages alongside the communities involved. It is important to acknowledge that the Traditional Owners, Elders, language custodians, and community members retain the core ownership of language heritage and knowledge.

Data Set

Challenge Entries

Generative AI: Unleashing the Power of Open Data

Explore the potential of Generative AI in conjunction with Open Data to empower communities and foster positive social impact. This challenge invites participants to leverage Generative AI models to analyse and derive insights from Open Data sourced from government datasets. By combining the power of Generative AI with the wealth of Open Data available, participants can create innovative solutions that address real-world challenges and benefit communities.

Go to Challenge | 29 teams have entered this challenge.