An Accurate and Trustworthy Chatbot for Data interactions

Project Info

Team Name

NT -Road Riders

Team Members

2 members with unpublished profiles.

Project Description

Problem Statement
Public sector budgeting and procurement processes involve vast datasets that are often complex, fragmented, and difficult to interpret without specialised tools. Analysts and decision-makers face several challenges:

Forecasting budget impacts based on dynamic procurement trends.
Identifying anomalies or compliance risks, such as split purchases or vendor outliers.
Extracting insights from structured datasets using natural language queries.
Ensuring transparency and auditability in automated analyses.
Maintaining trust in data-driven decisions, especially when AI is involved.
Traditional spreadsheet tools and dashboards often lack the interactivity, contextual understanding, and analytical depth to address these challenges effectively.

Solution Overview
The PBS What-If Simulator + Vetted Chat is a modern, interactive web application built with Streamlit, designed to empower analysts, procurement officers, and policy makers with intelligent, transparent decision support.

🔧 Core Capabilities:
What-If Simulation Engine: Allows users to simulate budget outcomes based on procurement changes, inflation, efficiency savings, and category-specific overrides.
Vetted Chat System: Uses rule-based analysers and NLP to interpret user queries, apply relevant filters, and return evidence-backed answers.
Trust Scoring Framework: Evaluates data reliability using freshness, coverage, consistency, statistical strength, and backtest accuracy.
Tamper-Evident Audit Logging: Every interaction is recorded with hash chaining to ensure traceability and integrity.
Chat History: Maintains session-based conversation history for context-aware responses and user reference.

🧠 Technical Stack:
Python for backend logic and data processing.
Streamlit for UI and interactivity.
spaCy for domain-specific NLP and entity recognition.
Hugging Face Transformers for fallback AI-based question answering and semantic search.
Pandas & NumPy for data manipulation.
Plotly for dynamic visualisations.

Conclusion
This project bridges the gap between structured public finance data and intuitive, trustworthy analytics. Combining deterministic analysers with conversational AI and a clean user interface transforms how users interact with PBS and procurement datasets.

The simulator enhances analytical depth and ensures transparency, auditability, and user empowerment. It is a prototype designed for internal government use, with potential applications in finance, compliance, policy evaluation, and public transparency.

Data Story

Budget 2024–2025 and Portfolio Budget Statements (PBS)
Context & Purpose
The Australian Government’s 2024–25 Budget, tabled on 14 May 2024, outlines the financial plans and priorities for the upcoming fiscal year. The Portfolio Budget Statements (PBS) provide detailed breakdowns of expenses, programs, and outcomes across government entities. This dataset, published by the Department of Finance, is designed to support transparency, analysis, and public engagement with budgetary data.

What’s in the Dataset?
The dataset includes:

PBS Excel spreadsheets with machine-readable tables such as:
Table 1.1: Entity Resource Statement
Table 1.2: Budget Measures
Table 2.X.1: Budgeted Expenses for Outcome X
Table 2.X.2: Program Component Expenses
Tables 3.1–3.6: Departmental Financial Statements
Tables 3.7–3.11: Administered Financial Statements
Selected tables from Budget Paper No. 4
CSV file: 2024-25 PBS Program Expense Line Items.csv containing line-item level data for:
Portfolio
Department/Entity
Outcome
Program
Expense Type
Appropriation Type
Annual figures from 2023–24 to 2027–282
Challenges in Interpretation
While the dataset is rich in detail, users should be aware of:

Discrepancies in totals: Totals in the CSV may not match those in Budget Paper No. 1 due to intra-entity charges, asset revaluations, and inclusion of additional entities.
Missing footnotes: Important context from original documents may not be captured in machine-readable formats.
Aggregation limitations: Subtotals and totals by entity and appropriation type are not included in the CSV but can be calculated programmatically.
Use Cases
Policy Analysis: Understand how funding is distributed across portfolios and programs.
Financial Forecasting: Project future expenses and assess budget sustainability.
Transparency & Accountability: Enable public scrutiny of government spending.
Data Journalism: Support investigative reporting with granular financial data.
AI & Automation: Feed structured budget data into analytical tools and simulations.
Conclusion
The 2024–25 PBS dataset is a cornerstone for open government and fiscal transparency. It empowers analysts, researchers, and citizens to explore how public funds are allocated and spent. With structured data spanning five years and covering dozens of portfolios, it offers a foundation for evidence-based decision-making and public accountability.

Evidence of Work

Team DataSets

Budget 2024-2025 and Portfolio Budget Statements (PBS) - Tables and Data

Data Set

Challenge Entries

An Accurate and Trustworthy Chatbot for Data Interactions

How can government agencies deploy conversational and analytical AI to interrogate complex datasets while maintaining the high degree of accuracy and auditability standards required for official decision-making?

Go to Challenge | 18 teams have entered this challenge.

Back to Projects