Performance Analysis of LLMs for Abstractive Summarization of Brazilian Legislative Documents

Authors

  • Danilo C.G. de Lucena Centro de Informática, Federal University of Pernambuco, Brazil
  • Ellen Souza MiningBR Research Group, Federal Rural University of Pernambuco, Brazil
  • Hidelberg O. Albuquerque Centro de Informática, Federal University of Pernambuco, Brazil
  • Nádia Félix Institute of Informatics, Federal University of Goiás, Brazil
  • Adriano L.I. Oliveira Centro de Informática, Federal University of Pernambuco, Brazil
  • André C.P.L.F. de Carvalho Institute of Mathematics and Computer Sciences, University of São Paulo, Brazil

DOI:

https://doi.org/10.59490/dgo.2025.969

Keywords:

large language models, summarization, legislative proposals

Abstract

Legislative documents present substantial obstacles to summarization due to their complex argument structures and specialized terminology. This research investigates the application of Large Language Models (LLMs) in summarizing Brazilian legislative proposals from the Chamber of Deputies, examining a dataset of over 56 thousand texts from 2013 to 2023. The paper explores three main summarization methodologies: extractive, abstractive, and hybrid, with an emphasis on abstractive summarization using LLMs. The performance of the LLM LLAMA2-13b is assessed using metrics such as ROUGE, BLEU, METEOR, BERTScore, and BERTopic, compared against reference summaries. The results show that LLMs can generate coherent and informative summaries, with positive evaluation metric results. Notably, the study reveals that traditional summary evaluation metrics may not be adequate for evaluating LLMs in summarization tasks. On the other hand, metrics based on pre-trained models like BERT provide a more effective evaluation of this innovative automatic summarization approach.

Downloads

Download data is not yet available.

References

Abualigah, L., Bashabsheh, M. Q., Alabool, H., & Shehab, M. (2020). Text summarization: A brief review. Recent Advances in NLP: the case of Arabic language, 1–15.

Albuquerque, H. O., Costa, R., Silvestre, G., Souza, E., da Silva, N. F., Vitório, D., Moriyama, G., Martins, L., Soezima, L., Nunes, A., et al. (2022). Ulyssesner-br: A corpus of brazilian legislative documents for named entity recognition. International Conference on Computational Processing of the Portuguese Language, 3–14.

Anand, D., & Wagh, R. (2022). Effective deep learning approaches for summarization of legal texts. Journal of King Saud University-Computer and Information Sciences, 34(5), 2141–2150.

Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72. [link]

Brown, T.,Mann, B., Ryder, N., Subbiah,M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.

Egan, N., Vasilyev, O., & Bohannon, J. (2022). Play the shannon game with language models: A human-free approach to summary evaluation. Proceedings of the AAAI conference on artificial intelligence, 36(10), 10599–10607.

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert systems with applications, 165, 113679.

Ermakova, L., Cossu, J. V., & Mothe, J. (2019). A survey on evaluation of summarization methods. Information processing & management, 56(5), 1794–1814.

Galgani, F., Compton, P., & Hoffmann, A. (2012). Combining different summarization techniques for legal text. Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, 115–123.

Goyal, T., Li, J. J., & Durrett, G. (2022). News summarization and evaluation in the era of gpt-3. arXiv preprint arXiv:2209.12356.

Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.

Jain, D., Borah, M. D., & Biswas, A. (2021). Summarization of legal documents: Where are we now and the way forward. Computer Science Review, 40, 100388.

Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. [link]

Lin, C.-Y., & Och, F. J. (2004). ORANGE: A method for evaluating automatic evaluation metrics for machine translation. COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, 501–507. [link]

Liu, Y., Jia, Q., & Zhu, K. (2022). Reference-free summarization evaluation via semantic correlation and compression ratio. Proceedings of the 2022 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2109–2115.

Luo, Z., Xie, Q., & Ananiadou, S. (2023). Chatgpt as a factual inconsistency evaluator for abstractive text summarization. arXiv preprint arXiv:2303.15621.

Neto, J. L., Freitas, A. A., & Kaestner, C. A. (2002). Automatic text summarization using a machine learning approach. Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002 Porto de Galinhas/Recife, Brazil, November 11–14, 2002 Proceedings 16, 205–215.

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-j. (2002). Bleu: A method for automatic evaluation of machine translation, 311–318.

Rawte, V., Sheth, A., & Das, A. (2023). A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922.

Souza, E., Moriyama, G., Vitório, D., de Carvalho, A. C., Félix, N., Albuquerque, H. O., & Oliveira, A. L. (2021). Assessing the impact of stemming algorithms applied to brazilian legislative documents retrieval. Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, 227–236.

Souza, E., Vitório, D., Moriyama, G., Santos, L., Martins, L., Souza, M., Fonseca, M., Félix, N., Carvalho, A. C., Albuquerque, H. O., & Oliveira, A. L. (2021, December). An information retrieval pipeline for legislative documents from the brazilian chamber of deputies. DOI: https://doi.org/10.3233/FAIA210326.

Tas, O., & Kiyani, F. (2007). A survey automatic text summarization. PressAcademia Procedia, 5(1), 205–213.

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

Vitório, D., Souza, E., Martins, L., da Silva, N. F., de Leon Ferreira de Carvalho, A. C. P., & Oliveira, A. L. (2022).

Ulysses-rfsq: A novel method to improve legal information retrieval based on relevance feedback. Brazilian Conference on Intelligent Systems, 77–91.

Vitório, D., Souza, E., Martins, L., da Silva, N. F., Oliveira, A. L., de Andrade, F. E., et al. (2023). Building a relevance feedback corpus for legal information retrieval in the real-case scenario of the brazilian chamber of deputies.

von Lucke, J., Fitsilis, F., & Etscheid, J. (2022). Using artificial intelligence for legislation-thinking about and selecting realistic topics. EGOV-CeDEM-ePart 2022, 32.

Wang, J., Liang, Y., Meng, F., Shi, H., Li, Z., Xu, J., Qu, J., & Zhou, J. (2023). Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). Bertscore: Evaluating text generation with bert. International Conference on Learning Representations. [link]

Downloads

Published

2025-05-20

How to Cite

de Lucena, D. C., Souza, E., Albuquerque, H. O., Félix, N., Oliveira, A. L., & de Carvalho, A. C. (2025). Performance Analysis of LLMs for Abstractive Summarization of Brazilian Legislative Documents. Conference on Digital Government Research, 26. https://doi.org/10.59490/dgo.2025.969

Conference Proceedings Volume

Section

Research papers