Project Description
Governments hold thousands of tables, but staff still struggle to ask plain-English questions and get answers they can trust. Many chatbots sound confident yet sometimes invent facts, and in government “almost right” is not good enough. The problem we tackle is simple: how to let people interrogate multiple datasets while guaranteeing that every answer is grounded in real rows, traceable, and safe. Our solution is NovaEra, a conversational data analyst that only answers from the data you provide. It blends HR, finance, and operations tables, runs clear calculations (counts, sums, trends, outlier checks), and shows its work. Each answer comes with a Trust Score (based on data coverage, specificity, and result quality) and a full Audit Trail that lists the steps, fields, filters, and tables used. If the data is missing, NovaEra says so and suggests what to upload or ask next. The framework is reusable across departments, includes simple question scaffolds (“Show vendor payment outliers”, “Are leave spikes unusual in Team A?”), and follows ethical AI practices—privacy-first processing, role-based access, bias checks, and transparent logs. In our demo with public HR/leave datasets and a >1M-row benchmark, NovaEra delivered grounded answers, flagged real anomalies, scaled cleanly, and turned hours of manual analysis into minutes—without hallucinations.
Data Story
Data Story
Government agencies hold thousands of datasets, but it’s hard for staff to ask plain-English questions and get reliable answers. Many modern AI chatbots sound convincing yet sometimes hallucinate—they give answers not grounded in the data. In government, that’s unacceptable: a wrong answer can affect budgets, people, and public services. Our focus is simple and urgent: build a chatbot that only answers from real data provided by the user, and clearly says “not enough data” when it can’t answer.
Why this matters now
- Accuracy is non-negotiable: “90% correct” is not good enough for audits, policy, or procurement.
- Accountability is required: Every answer must be traceable to real rows in real tables.
- Scaling demand: Teams must work with large datasets without slowing down or losing reliability.
- Public trust: Transparent, auditable AI helps agencies adopt AI responsibly.
Datasets used to demonstrate the approach
- HR dataset (Kaggle): employee performance, turnover, and workforce trends.
- Leave tracking dataset (Kaggle): types of leave, frequency, and department-level patterns.
- Large benchmark dataset (>1M rows): stress-tests speed, accuracy, trust scoring, and auditability at scale.
What NovaEra answers
- HR: Which departments show unusual leave behaviour?
- Workforce planning: Are current staffing levels sustainable given recent trends?
- Scale & reliability: How does accuracy hold up when the dataset grows very large?
How NovaEra prevents hallucination
- Grounded responses only: Answers are computed from the supplied tables; no external guessing.
- Trust score: Each answer includes a confidence score based on data coverage, specificity, and result quality.
- Audit trail: A step-by-step log shows which data was used, what calculations were run, and how the final answer was formed.
- Clear boundaries: If the data is missing or insufficient, NovaEra explains what’s needed to answer.
Bottom line: NovaEra delivers accurate, transparent, and ethical insights—built for government standards, not just clever conversation.
From sketches to system: dive into the reports, diagrams, and experiments that powered our nonstop push to build this chatbot.
