query-hub

Project Info

Team Name

query-hub

Team Members

GL , David and 1 other member with an unpublished profile.

Project Description

Finding the right government data is hard—datasets are scattered, inconsistent, and often hidden.

Our Goal: Build a single, user-friendly “national inventory” that aggregates metadata, relationships, and access methods.

Our solution makes data discovery as simple as asking a question: users search in plain language, and an AI-powered engine connects them to the most relevant datasets while revealing unexpected links across sources.

No SQL, pandas, specialized knowledge needed

We’re building a smarter, searchable national data inventory that helps policymakers, researchers, and communities unlock the full value of Australia’s public data.

Key Components

User Interface (UI)

Accepts inputs through text, voice, and smart uploads.
Supports natural language queries.
Allows users to add additional datasets directly.

Processing Layer

A Large Language Model (LLM) translates user queries into structured searches.
Dynamically builds and executes ABS Data Explorer API calls.
Integrates newly added datasets

Outputs

Provides relevant data results from government and user-added sources.
Offers multiple visualization options for data exploration and analysis.

Overall

Our solution simplifies dataset discovery, integration, and analysis.

By combining natural language querying with a dynamic Data Dictionary and Data Catalog, we empower users to explore, connect, and visualize Australia’s public data—turning a fragmented landscape into a powerful national data inventory.

#govhack #opendata #publicdata #govdata #datainnovation #hackforgood

Data Story

Data Dictionary

Inputs

- Datasets from GovHack datasets

- ABS Data Explorer API & manually downloaded xlxs & csv files
- User-uploaded datasets

Processing

- Parse semi structured data with LLMs to build schema, metadata and understanding
- Resolves inconsistencies and ambiguity across datasets.

Outputs

- An answer to the user's query
- Metadata describing government and user-added datasets.
- Analysis on key trends/insights
- Raw data used to draw conclusions & synthesis and original sources
- Visualizations of raw data that adapt on the fly to returned data

Data Catalog

Inputs

- Datasets from GovHack datasets

- ABS Data Explorer API & csv & xlsx files
- User-uploaded datasets

Processing

- Extracts and stores dataset attributes, including:

- Dataset Name

- Published Date / Collection Date

- Purpose

- Owning Government Department / Data Custodian

- Subject Area(s)

- Abstract / Summary

- Parses structured and semi-structured data.

- Normalize to remove edge cases
- Identifies relationships and dependencies between datasets.

Outputs

- A answer for the user
- Data to back up the answer with sources
- Visualizations to explore the answer more
- A comprehensive inventory of government and additional datasets, including metadata, relationships, and dependencies.