Comparing Machine Learning and an Expert System for Legal Document Classification

José Jorge de Queiroz Santos Filho; Filipe Araújo Dantas; Melquezedeque da Silva Lima; Shirley Barbosa dos Santos; Galileu Genesis; Maria Gabriely Lima da Salva; Álvaro Farias Pinheiro; Eraylson Galdino da Silva

doi:10.59490/dgo.2025.947

Authors

José Jorge de Queiroz Santos Filho University of Pernambuco, Brazil https://orcid.org/0009-0004-6613-7752
Filipe Araújo Dantas University of Pernambuco, Brazil https://orcid.org/0009-0007-0306-763X
Melquezedeque da Silva Lima Federal University of Pernambuco, Brazil https://orcid.org/0009-0005-6308-8681
Shirley Barbosa dos Santos The Office of the State Attorney General of Pernambuco, Brazil https://orcid.org/0009-0006-0286-3020
Galileu Genesis The Office of the State Attorney General of Pernambuco, Brazil https://orcid.org/0000-0003-2452-2076
Maria Gabriely Lima da Salva University of Pernambuco, Brazil https://orcid.org/0000-0002-3056-3985
Álvaro Farias Pinheiro The Office of the State Attorney General of Pernambuco, Brazil https://orcid.org/0000-0002-6254-7293
Eraylson Galdino da Silva University of Pernambuco, Brazil https://orcid.org/0000-0003-4287-9749

DOI:

https://doi.org/10.59490/dgo.2025.947

Keywords:

machine learning, legal document classification, expert systems, overfitting, natural language processing

Abstract

This study assesses the performance of machine learning models and a rule-based expert system in classifying legal documents, specifically in distinguishing relevant from irrelevant cases. The evaluated models include Random Forest, Naive Bayes, XGBoost, SVM, and Decision Tree, alongside an expert system developed by a State Attorney from PGE-PE. The datasets, representing Alvará, Arrolamento, and Inventário legal processes, contain labeled instances of legal cases. The models were assessed based on accuracy, precision, recall, and F1-score. The results suggest that while machine learning models—particularly Random Forest—achieve higher accuracy and precision, the expert system outperforms in recall and F1-score, ensuring that no relevant cases are overlooked. The choice between machine learning models and expert systems depends on the legal context, requiring a balance between efficiency (reducing false positives) and reliability (capturing all relevant cases).

Downloads

Download data is not yet available.

References

Al Hasan, S., Hussain, M. G., Protim, J., Rahman, M. M., Fahim, N., Chowdhury, M. Z., & Pritom, A. I. (2022). Classification of multi-labeled text articles with reuters dataset using svm. 2022 International Conference on Science and Technology (ICOSTECH), 01–05.

Bento, F. M., & Teive, R. C. G. (2023). Classificação de documentos jurídicos utilizando a arquitetura transformer: Uma análise comparativa com algoritmos tradicionais de machine learning e chatgpt. Brazilian Journal of Development, 9(6), 20208–20224.

Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., et al. (2023). Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2), e1484.

Cahyani, D. E., & Patasik, I. (2021). Performance comparison of tf-idf and word2vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics, 10(5), 2780–2788.

Chari, H., Aswale, S., Pawar, V. N., Shetgaonkar, P., & Kumar, K. C. (2021). Advertisement click fraud detection usingmachine learning techniques.2021 International Conference onTechnological Advancements and Innovations (ICTAI), 109–114.

Dias, L. C. M., & Cavalcante, L. G. M. (2023). Aplicação do classificador naive bayes para detecção de fraudes. Ciência Da Computação: Avanços E Tendências Em Pesquisa, 1, 9–26.

Gêda, B. M., et al. (2021). Classificação de textos de decisões judiciais.

Magalhães, D., Pozo, A., & Machado, S. (2022). Técnicas de aprendizado de máquinas aplicadas à classificação de decisões judiciais. Revista de Estudos Empíricos em Direito, 9.

Masri, N., Sultan, Y. A., Akkila, A. N., Almasri, A., Ahmed, A., Mahmoud, A. Y., Zaqout, I., & Abu-Naser, S. S. (2019). Survey of rule-based systems. International Journal of Academic Information Systems Research (IJAISR), 3(7), 1–23.

Moradi, R., Berangi, R., & Minaei, B. (2020). A survey of regularization strategies for deep models. Artificial Intelligence Review, 53(6), 3947–3986.

Morris, J. X., & Rush, A. M. (2024). Contextual document embeddings. arXiv preprint arXiv:2410.02525.

Noguti, M. Y., Vellasques, E., & Oliveira, L. S. (2020). Legal document classification: An application to law area prediction of petitions to public prosecution service. 2020 International joint conference on neural networks (IJCNN), 1–8.

Polo, F. M., Ciochetti, I., & Bertolo, E. (2021). Predicting legal proceedings status: Approaches based on sequential text data. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, 264–265.

Prentzas, J., & Hatzilygeroudis, I. (2007). Categorizing approaches combining rule-based and case-based reasoning. Expert Systems, 24(2), 97–122.

Sasikumar, M. (2007). A practical introduction to rule based expert systems.

Serras, F. R., & Finger, M. (2022). Verbert: Automating brazilian case law document multi-label categorization using bert. arXiv preprint arXiv:2203.06224.

Taha, A. Y., Tiun, S., Abd Rahman, A. H., & Sabah, A. (2021). Multilabel over-sampling and under-sampling with class alignment for imbalanced multilabel text classification. Journal of Information and Communication Technology, 20(3), 423–456.

Villena Román, J., Collada Pérez, S., Lana Serrano, S., & González Cristóbal, J. C. (2011). Hybrid approach combining machine learning and a rule-based expert system for text categorization.

Westermann, H., Šavelka, J., Walker, V. R., Ashley, K. D., & Benyekhlef, K. (2019). Computer-assisted creation of boolean search rules for text classification in the legal domain. In Legal knowledge and information systems (pp. 123–132). IOS Press.

Zhou, X., Du, H., Sun, Y., Ren, H., Cui, P., & Ma, Z. (2023). A new framework integrating reinforcement learning, a rule-based expert system, and decision tree analysis to improve building energy flexibility. Journal of Building Engineering, 71, 106536.

Comparing Machine Learning and an Expert System for Legal Document Classification

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Additional Files

Published

How to Cite

Conference Proceedings Volume

Section

License

Programme dg.o

Browse per article type:

Search:

Questions?