Ensuring Data Quality in National Educational Databases

Insights from Brazil’s Centralized Database of High School Students’ Data

Authors

  • Abílio Nogueira Barros Departamento de Computação, Universidade Federal Rural de Pernambuco (UFRPE) | Center of Excellence for Social Technologies (NEES), Brazil
  • Emanuel Marques Queiroga Instituto Federal de Educação, Ciência e Tecnologia Sul-Rio-Grandense (IFSul) | Center of Excellence for Social Technologies (NEES), Brazil
  • Markson Rebelo Marcolino Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina | Center of Excellence for Social Technologies (NEES), Brazil
  • Débora Barbosa Leite Silva Center of Excellence for Social Technologies (NEES), Brazil
  • Diego Dermeval Center of Excellence for Social Technologies (NEES), Brazil
  • André Lima Center of Excellence for Social Technologies (NEES), Brazil
  • Leonardo Brandão Marques Center of Excellence for Social Technologies (NEES), Brazil
  • Cristian Cechinel Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina | Center of Excellence for Social Technologies (NEES), Brazil
  • Thales Vieira Center of Excellence for Social Technologies (NEES), Brazil

DOI:

https://doi.org/10.59490/dgo.2025.943

Keywords:

Data quality, interoperability, data integrity, public policies, education

Abstract

This study investigates challenges in ensuring data quality within Brazil’s national educational database, the Sistema Gestão Presente (SGP), and proposes solutions. Reliable and integrated data systems are critical for evidence-based policymaking, particularly in education. The SGP, designed to centralize student attendance and enrollment data, faces issues such as inconsistent data entry, logical errors, and systemic reporting anomalies. To address these challenges, a data-driven methodology inspired by the DMAIC framework was implemented, focusing on defining problems, measuring and analyzing data inconsistencies, improving processes through tailored solutions, and monitoring outcomes for continuous quality assurance. Seven case studies illustrate the results of this approach. These include resolving inconsistent enrollment dates, limiting multiple active enrollments per student, and ensuring consistency between disenrollment justifications and active statuses. Further, systemic anomalies, such as inflated attendance rates and implausibly high class hours reported at state levels, were identified and corrected through validation rules, training initiatives, and auditing mechanisms. These interventions reduced data
inconsistencies, enhanced reliability, and improved system usability. The findings demonstrate how integrating validation mechanisms, improving data entry workflows, and fostering stakeholder collaboration can address large-scale data challenges. By ensuring the quality and integrity of educational data, the SGP enables more accurate insights into attendance patterns, dropout rates, and program effectiveness. These improvements lay the foundation for robust evidence-based policies and equitable educational outcomes across Brazil, highlighting the transformative potential of data quality assurance for public sector decision-making.

Downloads

Download data is not yet available.

Downloads

Published

2025-05-19

How to Cite

Nogueira Barros, A., Marques Queiroga, E., Rebelo Marcolino, M., Barbosa Leite Silva, D., Dermeval, D., Lima, A., Brandão Marques, L., Cechinel, C., & Vieira, T. (2025). Ensuring Data Quality in National Educational Databases: Insights from Brazil’s Centralized Database of High School Students’ Data. Conference on Digital Government Research, 1. https://doi.org/10.59490/dgo.2025.943