Forecasting Student Enrollments in Brazilian Schools for Equitable and Efficient Education Resource Allocation
DOI:
https://doi.org/10.59490/dgo.2025.939Keywords:
Machine Learning, Random Forest, forecasting, enrollment, Brazilian educationAbstract
In recent years, there has been growing scientific interest in developing effective techniques for forecasting student enrollment across the school spectrum (i.e., primary, secondary, and higher education). Enrollment forecasting is crucial in shaping public education policies by guiding resource allocation and ensuring equitable access to educational opportunities. In this sense, Machine Learning (ML) models emerge as a promising approach to forecasting the number of students that should be enrolled in a given school term by considering the high complexity of grouping and identifying useful patterns in the prediction process. In this work, we develop a predictive model based on the Random Forest (RF) algorithm to forecast the enrollment of students across the entire spectrum of Brazilian education. We use a database provided by the National Education Development Fund (FNDE), a Brazilian government body responsible for purchasing and distributing textbooks to all public schools. We generate 1,531,185 time series to serve as an input to RF processing. Our training dataset utilized data between 2010 and 2020, and our testing dataset utilizes data from 2021. As a result, RF obtains a higher performance in all the investigated scenarios concerning the Exponential Smoothing (ES) baseline algorithm. Since RF demonstrated acceptable performance, the Brazilian government could benefit from this forecasting technique for student enrollment in school environments and to ensure equitable access to essential resources, such as didactic materials, for the students.
Downloads
References
Abideen, Z. u., Mazhar, T., Razzaq, A., Haq, I., Ullah, I., Alasmary, H., & Mohamed, H. G. (2023). Analysis of Enrollment Criteria in Secondary Schools Using Machine Learning and Data Mining Approach [Publisher: MDPI]. Electronics, 12(3), 694. Retrieved February 15, 2024, from [link]
Amarasinghe, K., Rodolfa, K. T., Lamba, H., & Ghani, R. (2023). Explainable machine learning for public policy: Use cases, gaps, and research directions. Data & Policy, 5, e5. DOI: https://doi.org/10.1017/dap.2023.2.
Andrade, J. (2023, February). Censo escolar: Matrículas na educação básica cresceram em 2022 [Last accessed 19 Feb. 2024].
Ayasi, B., Saleh, M., García-Vico, A., & Carmona, C. J. (2023, December). Predicting course enrollment with machine learning and neural networks: A comparative study of algorithms. ISTES Organization Monument, CO, USA.
Billah, B., King, M. L., Snyder, R. D., & Koehler, A. B. (2006). Exponential smoothing model selection for forecasting. International Journal of Forecasting, 22(2), 239–247. DOI: https://doi.org/10.1016/j.ijforecast.2005.08.002.
Bliemel, F. (1973). Theil’s forecast accuracy coefficient: A clarification. Journal of Marketing Research, 10(4), 444–446. Retrieved January 5, 2024, from [link]
Chen, Q. (2022). A comparative study on the forecast models of the enrollment proportion of general education and vocational education. International Education Studies, 15(6), 109–126.
Feng, S., Zhou, S., & Liu, Y. (2011). Research on data mining in university admissions decision-making [Last accessed 20 Jun. 2023.]. International Journal of Advancements in Computing Technology, 3(6), 176–186. https://doi.org/10.4156/ijact.vol3.issue6.21
Fischer-Abaigar, U., Kern, C., Barda, N., & Kreuter, F. (2024). Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector. Government Information Quarterly, 41(4), 101976. DOI: https://doi.org/10.1016/j.giq.2024.101976.
FNDE. (2022, January). Em 2021 foram investidos r1; 9bilhoemlivrosematerialdidticodoPNLD. [link]
Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact. MIS Q., 37(2), 337–356. DOI: https://doi.org/10.25300/MISQ/2013/37.2.01.
Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Q., 28(1), 75–105.
Hiregoudar, S. (2020, August). Ways to evaluate regression models [Accessed 23 Feb. 2023]. [link]
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. DOI: https://doi.org/10.1016/j.ijforecast.2006.03.001.
Khademi, M., & Nakhkob, B. (2016). Predicted increase enrollment in higher education using neural networks and data mining techniques. Journal of Computer Research and Development, 7, 125–140.
Masini, R. P., Medeiros, M. C., & Mendes, E. F. (2023). Machine learning advances for time series forecasting. Journal of Economic Surveys, 37(1), 76–111. DOI: https://doi.org/10.1111/joes.12429.
Peffers, K., Tuunanen, T., Rothenberger, M. A., & and, S. C. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3), 45–77. DOI: https://doi.org/10.2753/MIS0742-1222240302.
Penteado, K. (2021, June). Métricas de avaliação para séries temporais [Accessed 22 Feb. 2023]. [link]
Pichler, M., & Hartig, F. (2023). Machine learning and deep learning—A review for ecologists. Methods in Ecology and Evolution, 14(4), 994–1016. DOI: https://doi.org/10.1111/2041-210X.14061.
Robinson, A. P., & Hamann, J. D. (2011). Imputation and interpolation. In Forest analytics with r: An introduction (pp. 117–151). Springer New York. DOI: https://doi.org/10.1007/978-1-4419-7762-5_4.
Sammut, C., & Webb, G. I. (Eds.). (2017). Encyclopedia of machine learning and data mining. Springer. DOI: https://doi.org/10.1007/978-1-4899-7687-1.
Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science, 2(3), 160. DOI: https://doi.org/10.1007/s42979-021-00592-x.
Scholl, H. J. (2024). Digital government research: Evolution of topical directions. Proceedings of the 25th Annual International Conference on Digital Government Research, 423–433. DOI: https://doi.org/10.1145/3657054.3657106.
Shao, L., Ieong, M., Levine, R. A., Stronach, J., & Fan, J. (2022). Machine Learning Methods for Course Enrollment Prediction. Strategic enrollment management quarterly, 10(2). Retrieved February 15, 2024, from [link]
Sobrinho, A., Bittencourt, I. I., Silveira, A. C. M., Silva, A. P., Dermeval, D., Marques, L. M., Rodrigues, N. C. I., Souza, A. C. S., Ferreira, R., & Isotani, S. (2023). Towards digital transformation of the validation and triage process of textbooks in the brazilian educational policy. Sustainability, 15(7). DOI: https://doi.org/10.3390/su15075861.
Soltys, M., Dang, H. D., Reyes Reilly, G., & Soltys, K. (2021). Enrollment predictions with machine learning [Last accessed 20 Jun. 2023]. Strategic Enrollment Management Quarterly, 9(2), 11–18. [link]
Xu, M., Fralick, D., Zheng, J., Wang, B., Tu, X., & Feng, C. (2017). The differences and similarities between twosample t-test and paired t-test. Shanghai Arch Psychiatry, 29(3), 184–188.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Lenardo Chaves e Silva, Luciano de Souza Cabral, Jário José dos Santos Júnior, Luam Leiverton Pereira dos Santos, Thyago Tenório Martins de Oliveira, Breno Jacinto Duarte da Costa, Joana Fusco Lobo, Dalgoberto Miguilino Pinho Júnior, Nicholas Joseph Tavares da Cruz, Rafael de Amorim Silva, Bruno Almeida Pimentel

This work is licensed under a Creative Commons Attribution 4.0 International License.
