Project Description
Project Description
Finding the right government data is hard—datasets are scattered, inconsistent, and often hidden.
Our Goal: Build a single, user-friendly “national inventory” that aggregates metadata, relationships, and access methods.
Our solution makes data discovery as simple as asking a question: users search in plain language, and an AI-powered engine connects them to the most relevant datasets while revealing unexpected links across sources.
No SQL, pandas, specialized knowledge needed
We’re building a smarter, searchable national data inventory that helps policymakers, researchers, and communities unlock the full value of Australia’s public data.
Key Components
User Interface (UI)
- Accepts inputs through text, voice, and smart uploads.
- Supports natural language queries.
- Allows users to add additional datasets directly.
Processing Layer
- A Large Language Model (LLM) translates user queries into structured searches.
- Dynamically builds and executes ABS Data Explorer API calls.
- Integrates newly added datasets
Outputs
- Provides relevant data results from government and user-added sources.
- Offers multiple visualization options for data exploration and analysis.
Overall
Our solution simplifies dataset discovery, integration, and analysis.
By combining natural language querying with a dynamic Data Dictionary and Data Catalog, we empower users to explore, connect, and visualize Australia’s public data—turning a fragmented landscape into a powerful national data inventory.
Data Story
Data Dictionary
Inputs
- Datasets from GovHack datasets
- ABS Data Explorer API & manually downloaded xlxs & csv files
- User-uploaded datasets
Processing
- Parse semi structured data with LLMs to build schema, metadata and understanding
- Resolves inconsistencies and ambiguity across datasets.
Outputs
- An answer to the user's query
- Metadata describing government and user-added datasets.
- Analysis on key trends/insights
- Raw data used to draw conclusions & synthesis and original sources
- Visualizations of raw data that adapt on the fly to returned data
Data Catalog
Inputs
- Datasets from GovHack datasets
- ABS Data Explorer API & csv & xlsx files
- User-uploaded datasets
Processing
- Extracts and stores dataset attributes, including:
- Dataset Name
- Published Date / Collection Date
- Purpose
- Owning Government Department / Data Custodian
- Subject Area(s)
- Abstract / Summary
- Parses structured and semi-structured data.
- Normalize to remove edge cases
- Identifies relationships and dependencies between datasets.
Outputs
- A answer for the user
- Data to back up the answer with sources
- Visualizations to explore the answer more
- A comprehensive inventory of government and additional datasets, including metadata, relationships, and dependencies.