Forecasting Student Enrollments in Brazilian Schools for Equitable and Efficient Education Resource Allocation

Authors

  • Lenardo Chaves e Silva Federal Rural University of the Semi-Ardi | Center for Excellence in Social Technologies, Brazil
  • Luciano de Souza Cabral Pernambuco Federal Institute of Education, Science, and Technology | Center for Excellence in Social Technologies, Brazil
  • Jário José dos Santos Júnior Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil
  • Luam Leiverton Pereira dos Santos Federal University of the San Francisco Valley | Center for Excellence in Social Technologies, Brazil
  • Thyago Tenório Martins de Oliveira Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil
  • Breno Jacinto Duarte da Costa Alagoas Federal Institute of Education, Science and Technology | Center for Excellence in Social Technologies, Brazil
  • Joana Fusco Lobo National Education Development Fund, Brazil
  • Dalgoberto Miguilino Pinho Júnior Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil
  • Nicholas Joseph Tavares da Cruz Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil
  • Rafael de Amorim Silva Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil
  • Bruno Almeida Pimentel Federal University of Alagoas | Center for Excellence in Social Technologies, Brazil

DOI:

https://doi.org/10.59490/dgo.2025.939

Keywords:

Machine Learning, Random Forest, forecasting, enrollment, Brazilian education

Abstract

In recent years, there has been growing scientific interest in developing effective techniques for forecasting student enrollment across the school spectrum (i.e., primary, secondary, and higher education). Enrollment forecasting is crucial in shaping public education policies by guiding resource allocation and ensuring equitable access to educational opportunities. In this sense, Machine Learning (ML) models emerge as a promising approach to forecasting the number of students that should be enrolled in a given school term by considering the high complexity of grouping and identifying useful patterns in the prediction process. In this work, we develop a predictive model based on the Random Forest (RF) algorithm to forecast the enrollment of students across the entire spectrum of Brazilian education. We use a database provided by the National Education Development Fund (FNDE), a Brazilian government body responsible for purchasing and distributing textbooks to all public schools. We generate 1,531,185 time series to serve as an input to RF processing. Our training dataset utilized data between 2010 and 2020, and our testing dataset utilizes data from 2021. As a result, RF obtains a higher performance in all the investigated scenarios concerning the Exponential Smoothing (ES) baseline algorithm. Since RF demonstrated acceptable performance, the Brazilian government could benefit from this forecasting technique for student enrollment in school environments and to ensure equitable access to essential resources, such as didactic materials, for the students.

Downloads

Download data is not yet available.

References

Abideen, Z. u., Mazhar, T., Razzaq, A., Haq, I., Ullah, I., Alasmary, H., & Mohamed, H. G. (2023). Analysis of Enrollment Criteria in Secondary Schools Using Machine Learning and Data Mining Approach [Publisher: MDPI]. Electronics, 12(3), 694. Retrieved February 15, 2024, from [link]

Amarasinghe, K., Rodolfa, K. T., Lamba, H., & Ghani, R. (2023). Explainable machine learning for public policy: Use cases, gaps, and research directions. Data & Policy, 5, e5. DOI: https://doi.org/10.1017/dap.2023.2.

Andrade, J. (2023, February). Censo escolar: Matrículas na educação básica cresceram em 2022 [Last accessed 19 Feb. 2024].

Ayasi, B., Saleh, M., García-Vico, A., & Carmona, C. J. (2023, December). Predicting course enrollment with machine learning and neural networks: A comparative study of algorithms. ISTES Organization Monument, CO, USA.

Billah, B., King, M. L., Snyder, R. D., & Koehler, A. B. (2006). Exponential smoothing model selection for forecasting. International Journal of Forecasting, 22(2), 239–247. DOI: https://doi.org/10.1016/j.ijforecast.2005.08.002.

Bliemel, F. (1973). Theil’s forecast accuracy coefficient: A clarification. Journal of Marketing Research, 10(4), 444–446. Retrieved January 5, 2024, from [link]

Chen, Q. (2022). A comparative study on the forecast models of the enrollment proportion of general education and vocational education. International Education Studies, 15(6), 109–126.

Feng, S., Zhou, S., & Liu, Y. (2011). Research on data mining in university admissions decision-making [Last accessed 20 Jun. 2023.]. International Journal of Advancements in Computing Technology, 3(6), 176–186. https://doi.org/10.4156/ijact.vol3.issue6.21

Fischer-Abaigar, U., Kern, C., Barda, N., & Kreuter, F. (2024). Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector. Government Information Quarterly, 41(4), 101976. DOI: https://doi.org/10.1016/j.giq.2024.101976.

FNDE. (2022, January). Em 2021 foram investidos r1; 9bilhoemlivrosematerialdidticodoPNLD. [link]

Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact. MIS Q., 37(2), 337–356. DOI: https://doi.org/10.25300/MISQ/2013/37.2.01.

Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Q., 28(1), 75–105.

Hiregoudar, S. (2020, August). Ways to evaluate regression models [Accessed 23 Feb. 2023]. [link]

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. DOI: https://doi.org/10.1016/j.ijforecast.2006.03.001.

Khademi, M., & Nakhkob, B. (2016). Predicted increase enrollment in higher education using neural networks and data mining techniques. Journal of Computer Research and Development, 7, 125–140.

Masini, R. P., Medeiros, M. C., & Mendes, E. F. (2023). Machine learning advances for time series forecasting. Journal of Economic Surveys, 37(1), 76–111. DOI: https://doi.org/10.1111/joes.12429.

Peffers, K., Tuunanen, T., Rothenberger, M. A., & and, S. C. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3), 45–77. DOI: https://doi.org/10.2753/MIS0742-1222240302.

Penteado, K. (2021, June). Métricas de avaliação para séries temporais [Accessed 22 Feb. 2023]. [link]

Pichler, M., & Hartig, F. (2023). Machine learning and deep learning—A review for ecologists. Methods in Ecology and Evolution, 14(4), 994–1016. DOI: https://doi.org/10.1111/2041-210X.14061.

Robinson, A. P., & Hamann, J. D. (2011). Imputation and interpolation. In Forest analytics with r: An introduction (pp. 117–151). Springer New York. DOI: https://doi.org/10.1007/978-1-4419-7762-5_4.

Sammut, C., & Webb, G. I. (Eds.). (2017). Encyclopedia of machine learning and data mining. Springer. DOI: https://doi.org/10.1007/978-1-4899-7687-1.

Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science, 2(3), 160. DOI: https://doi.org/10.1007/s42979-021-00592-x.

Scholl, H. J. (2024). Digital government research: Evolution of topical directions. Proceedings of the 25th Annual International Conference on Digital Government Research, 423–433. DOI: https://doi.org/10.1145/3657054.3657106.

Shao, L., Ieong, M., Levine, R. A., Stronach, J., & Fan, J. (2022). Machine Learning Methods for Course Enrollment Prediction. Strategic enrollment management quarterly, 10(2). Retrieved February 15, 2024, from [link]

Sobrinho, A., Bittencourt, I. I., Silveira, A. C. M., Silva, A. P., Dermeval, D., Marques, L. M., Rodrigues, N. C. I., Souza, A. C. S., Ferreira, R., & Isotani, S. (2023). Towards digital transformation of the validation and triage process of textbooks in the brazilian educational policy. Sustainability, 15(7). DOI: https://doi.org/10.3390/su15075861.

Soltys, M., Dang, H. D., Reyes Reilly, G., & Soltys, K. (2021). Enrollment predictions with machine learning [Last accessed 20 Jun. 2023]. Strategic Enrollment Management Quarterly, 9(2), 11–18. [link]

Xu, M., Fralick, D., Zheng, J., Wang, B., Tu, X., & Feng, C. (2017). The differences and similarities between twosample t-test and paired t-test. Shanghai Arch Psychiatry, 29(3), 184–188.

Downloads

Published

2025-05-19

How to Cite

Chaves e Silva, L., de Souza Cabral, L., José dos Santos Júnior, J., Leiverton Pereira dos Santos, L., Tenório Martins de Oliveira, T., Jacinto Duarte da Costa, B., Fusco Lobo, J., Miguilino Pinho Júnior, D., Joseph Tavares da Cruz, N., de Amorim Silva, R., & Almeida Pimentel, B. (2025). Forecasting Student Enrollments in Brazilian Schools for Equitable and Efficient Education Resource Allocation. Conference on Digital Government Research, 26. https://doi.org/10.59490/dgo.2025.939

Conference Proceedings Volume

Section

Research papers