Ensuring Data Quality in National Educational Databases
Insights from Brazil’s Centralized Database of High School Students’ Data
DOI:
https://doi.org/10.59490/dgo.2025.943Keywords:
Data quality, interoperability, data integrity, public policies, educationAbstract
This study investigates challenges in ensuring data quality within Brazil’s national educational database, the Sistema Gestão Presente (SGP), and proposes solutions. Reliable and integrated data systems are critical for evidence-based policymaking, particularly in education. The SGP, designed to centralize student attendance and enrollment data, faces issues such as inconsistent data entry, logical errors, and systemic reporting anomalies. To address these challenges, a data-driven methodology inspired by the DMAIC framework was implemented, focusing on defining problems, measuring and analyzing data inconsistencies, improving processes through tailored solutions, and monitoring outcomes for continuous quality assurance. Seven case studies illustrate the results of this approach. These include resolving inconsistent enrollment dates, limiting multiple active enrollments per student, and ensuring consistency between disenrollment justifications and active statuses. Further, systemic anomalies, such as inflated attendance rates and implausibly high class hours reported at state levels, were identified and corrected through validation rules, training initiatives, and auditing mechanisms. These interventions reduced data
inconsistencies, enhanced reliability, and improved system usability. The findings demonstrate how integrating validation mechanisms, improving data entry workflows, and fostering stakeholder collaboration can address large-scale data challenges. By ensuring the quality and integrity of educational data, the SGP enables more accurate insights into attendance patterns, dropout rates, and program effectiveness. These improvements lay the foundation for robust evidence-based policies and equitable educational outcomes across Brazil, highlighting the transformative potential of data quality assurance for public sector decision-making.
Downloads
References
Barbalho, I. M., Fernandes, F., Barros, D. M., Paiva, J. C., Henriques, J., Morais, A. H., Coutinho, K. D., Coelho Neto, G. C., Chioro, A., & Valentim, R. A. (2022). Electronic health records in brazil: Prospects and technological challenges. Frontiers in Public Health, 10, 963841.
BOZ, M., & SIMSEK, I. (2022). Analysis of education management information systems of the ministry of national education in terms of interoperability. Journal of Qualitative Research in Education, (32). DOI: https://doi.org/10.14689/enad.32.1702.
Brasil. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). (2024). Censo da educação básica 2023: Notas estatísticas. Inep.
Coelho Neto, G. C., Andreazza, R., & Chioro, A. (2021). Integration among national health information systems in brazil: The case of e-sus primary care. Revista de saude publica, 55, 93.
D’Amore, J., et al. (2018). Interoperability progress and remaining data quality barriers of certified health information technologies. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2018, 358–367.
Du, J. (2021). Research on the construction of educational data quality model based on multiple constraints model. 2021 IEEE 4th International Conference on Information Systems and Computer Aided Education (ICISCAE), 363–367.
Filgueiras, F., & Lui, L. (2023). Designing data governance in brazil: An institutional analysis. Policy Design and Practice, 6(1), 41–56.
Jenkins, M., & Duri, J. (2020). The role of evidence, data and research findings in promoting integrity in education (tech. rep.) (This document is part of the Transparency International Anti-Corruption Helpdesk series.). Transparency International. Berlin, Germany. [link]
Macarini, L. A. M., dos Santos, H. L., Cechinel, C., Ochoa, X. O., Rodés, V., Casas, A. P., Lucas, P. P., Maya, R., Alonso, G. E. A., & Díaz, P. (2020). Towards the implementation of a countrywide k-12 learning analytics initiative in uruguay. Interactive Learning Environments, 28(2), 166–190. DOI: https://doi.org/10.1080/10494820.2019.1636082.
Macedo de Andrade, F., Ceneviva, R., & Koslisnki, M. C. (2022). Escolas em foco: A avaliação de impacto do programa de’políticas públicas baseadas em evidências’ da rede municipal de ensino do rio de janeiro. Education Policy Analysis Archives/Archivos Analíticos de Políticas Educativas/Arquivos Analíticos de Políticas Educativas, 30.
Queiroga, E. M., Batista Machado, M. F., Paragarino, V. R., Primo, T. T., & Cechinel, C. (2022). Early prediction of at-risk students in secondary education: A countrywide k-12 learning analytics initiative in uruguay. Information, 13(9). DOI: https://doi.org/10.3390/info13090401.
Queiroga, E. M., Siqueira, E. S., dos Santos Portela, C., Cordeiro, T. D., Bittencourt, I. I., Isotani, S., Melo, R. F., Muñoz, R., & Cechinel, C. (2024). Data-driven strategies for achieving school equity: Insights from brazil and policy recommendations. IEEE Access.
Redyuk, S., Kaoudi, Z., Markl, V., & Schelter, S. (2021). Automating data quality validation for dynamic data ingestion [Published under the terms of the Creative Commons license CC-by-nc-nd 4.0]. Proceedings of the 24th International Conference on Extending Database Technology (EDBT), March 23–26, 2021. [link].
Restrepo-Carmona, J. A., Zuluaga, J. C., Velásquez, M., Zuluaga, C., Villamil, R. M., Morales, O., Hurtado, Á. M., Escobar, C. A., Sierra-Pérez, J., & Vásquez, R. E. (2024). Smart supervision of public expenditure: A review on data capture, storage, processing, and interoperability with a case study from colombia. Information, 15(10), 616.
Rocha, J. C., Ramos, V., Cechinel, C., Hernández-Leal, E. J., Munoz, R., & Primo, T. T. (2024). Data interoperability in learning analytics - review of literature. 2024 L Latin American Computer Conference (CLEI), 1–8. DOI: https://doi.org/10.1109/CLEI64178.2024.10700464.
Sellar, S. (2015). Data infrastructure: A review of expanding accountability systems and large-scale assessments in education. Discourse: Studies in the Cultural Politics of Education, 36(5), 765–777. DOI: https://doi.org/10.1080/01596306.2014.931117.
Srinivasan, K., Muthu, S., Devadasan, S., & Sugumaran, C. (2016). Six sigma through dmaic phases: A literature review. International Journal of Productivity and Quality Management, 17(2), 236–257. DOI: https://doi.org/10.1504/IJPQM.2016.074462.
Taleb, I., Serhani, M. A., Bouhaddioui, C., & Dssouli, R. (2021). Big data quality framework: A holistic approach to continuous quality management. Journal of Big Data, 8(1), 76. DOI: https://doi.org/10.1186/s40537-021-00468-0.
Tavares, A. R., & Bitencourt, C. M. (2022). Avaliação de políticas públicas e interoperabilidade na perspectiva da governança pública digital. Revista de Direito Econômico e Socioambiental, 13(3), 687–723.
Valdés, E., & Lecaros, J. A. (2024). Biobanks and data interoperability in latin america: Engendering highquality evidence for the global research ecosystem. Frontiers in Medicine, 11, 1481891.
Vasques, L., Rijo, R., & Alves, D. (2024). Interoperability at healthcare institutions in brazil: Framework for identify maturity stages. Procedia Computer Science, 239, 2316–2321.
Wahid, A. M., Afuan, L., & Utomo, F. S. (2024). Enhancing collaboration data management through data warehouse design: Meeting ban-pt accreditation and kerma reporting requirements in higher education. J. Tek. Inform. Jutif, 5(6).
Wijayanti, W., Hidayanto, A. N., Wilantika, N., Adawati, I. R., & Yudhoatmojo, S. B. (2018). Data quality assessment on higher education: A case study of institute of statistics. 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 231–236.
Williamson, B. (2019). Policy networks, performance metrics and platform markets: Charting the expanding data infrastructure of higher education. British Journal of Educational Technology, 50(6), 2794–2809.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Abílio Nogueira Barros, Emanuel Marques Queiroga, Markson Rebelo Marcolino, Débora Barbosa Leite Silva, Diego Dermeval, André Lima, Leonardo Brandão Marques, Cristian Cechinel, Thales Vieira

This work is licensed under a Creative Commons Attribution 4.0 International License.
