Nova Era

Project Info

Team Name

Nova Era

Team Members

Phong , Kevin Vo , Alvin , Lea , Lam Nguyen , Trang Tran

Project Description

Governments hold thousands of tables, but staff still struggle to ask plain-English questions and get answers they can trust. Many chatbots sound confident yet sometimes invent facts, and in government “almost right” is not good enough. The problem we tackle is simple: how to let people interrogate multiple datasets while guaranteeing that every answer is grounded in real rows, traceable, and safe. Our solution is NovaEra, a conversational data analyst that only answers from the data you provide. It blends HR, finance, and operations tables, runs clear calculations (counts, sums, trends, outlier checks), and shows its work. Each answer comes with a Trust Score (based on data coverage, specificity, and result quality) and a full Audit Trail that lists the steps, fields, filters, and tables used. If the data is missing, NovaEra says so and suggests what to upload or ask next. The framework is reusable across departments, includes simple question scaffolds (“Show vendor payment outliers”, “Are leave spikes unusual in Team A?”), and follows ethical AI practices—privacy-first processing, role-based access, bias checks, and transparent logs. In our demo with public HR/leave datasets and a >1M-row benchmark, NovaEra delivered grounded answers, flagged real anomalies, scaled cleanly, and turned hours of manual analysis into minutes—without hallucinations.

#ai #chatbot #trustworthy ai #government data #ethical ai #northern territory #govhack2025

Data Story

Government agencies hold thousands of datasets, but it’s hard for staff to ask plain-English questions and get reliable answers. Many modern AI chatbots sound convincing yet sometimes hallucinate—they give answers not grounded in the data. In government, that’s unacceptable: a wrong answer can affect budgets, people, and public services. Our focus is simple and urgent: build a chatbot that only answers from real data provided by the user, and clearly says “not enough data” when it can’t answer.

Why this matters now

Accuracy is non-negotiable: “90% correct” is not good enough for audits, policy, or procurement.
Accountability is required: Every answer must be traceable to real rows in real tables.
Scaling demand: Teams must work with large datasets without slowing down or losing reliability.
Public trust: Transparent, auditable AI helps agencies adopt AI responsibly.

Datasets used to demonstrate the approach

HR dataset (Kaggle): employee performance, turnover, and workforce trends.
Leave tracking dataset (Kaggle): types of leave, frequency, and department-level patterns.
Large benchmark dataset (>1M rows): stress-tests speed, accuracy, trust scoring, and auditability at scale.

What NovaEra answers

HR: Which departments show unusual leave behaviour?
Workforce planning: Are current staffing levels sustainable given recent trends?
Scale & reliability: How does accuracy hold up when the dataset grows very large?

How NovaEra prevents hallucination

Grounded responses only: Answers are computed from the supplied tables; no external guessing.
Trust score: Each answer includes a confidence score based on data coverage, specificity, and result quality.
Audit trail: A step-by-step log shows which data was used, what calculations were run, and how the final answer was formed.
Clear boundaries: If the data is missing or insufficient, NovaEra explains what’s needed to answer.

Bottom line: NovaEra delivers accurate, transparent, and ethical insights—built for government standards, not just clever conversation.
From sketches to system: dive into the reports, diagrams, and experiments that powered our nonstop push to build this chatbot.

Team DataSets

Kaggle Human Resources Datasets

Description of Use We use the Kaggle Human Resources Datasets to demonstrate how NovaEra can analyse employee records, HR actions, and performance data in a government context. By grounding responses in this dataset, NovaEra showcases its ability to provide accurate HR analytics, identify workforce risks, and support evidence-based decision-making while maintaining transparency and auditability.

Data Set

Hugging Face Tabular Dataset

Description of Use We use the Tabular Benchmark dataset to test and validate NovaEra’s accuracy, scalability, and trust scoring mechanisms. By running the chatbot on diverse tabular datasets with thousands of rows and multiple features, we ensure that NovaEra can reliably interrogate large, complex datasets beyond HR use cases. This benchmark dataset helps us: - Evaluate NovaEra’s performance across varied data types (numerical, categorical). - Stress-test scalability when handling over one million rows. - Validate trust scoring and auditability features against known benchmark results. Including this dataset demonstrates that NovaEra’s framework is transferable and robust, making it suitable not only for HR and leave data but also for broader government and operational datasets.

Data Set

Kaggle Employee Leave Tracking Data

Description of Use We use the Employee Leave Tracking Data to train and test NovaEra’s ability to provide accurate, trustworthy insights into workforce patterns. The dataset allows the chatbot to answer queries such as: “What are the most common leave types in 2024?”, “Which departments show higher absenteeism trends?” , “How many employees still have large leave balances?” By grounding chatbot responses in this dataset, NovaEra demonstrates practical HR analytics, supports leave management optimization, and ensures responses are both transparent and auditable.

Data Set