The dissemination of judicial decisions not only provides a valuable source of decision support for judges and legal practitioners but also strengthens public confidence in the judicial system. However, the nature of the data raises privacy concerns as the documents include personal and, often, sensitive data such as health, financial, religious beliefs, sexual orientation, and so on. In recent years, especially since the introduction of GDPR, the international scientific community has paid much attention to the issue of privacy and automatic anonymization tools, but unfortunately, nothing has been done in the Italian legal context. In this paper, we present a first solution aimed at automatic anonymization of the Italian National Jurisprudential Archive (Archivio Giurisprudenziale Nazionale) domain based on pre-trained Transformers embeddings (Clark et al., 2020, Devlin et al., 2019) and spaCy’s transition-based parsing for entity recognition (Honnibal and Montani, 2017). It achieves more than 94.7% recall (>99% for Person and ID entities) and supports several anonymization methods that can be applied to the text depending on the purpose of anonymization

Automatic Anonymization of Italian Legal Textual Documents using Deep Learning

Licari D
;
Romano MF;Comande' G
2022

Abstract

The dissemination of judicial decisions not only provides a valuable source of decision support for judges and legal practitioners but also strengthens public confidence in the judicial system. However, the nature of the data raises privacy concerns as the documents include personal and, often, sensitive data such as health, financial, religious beliefs, sexual orientation, and so on. In recent years, especially since the introduction of GDPR, the international scientific community has paid much attention to the issue of privacy and automatic anonymization tools, but unfortunately, nothing has been done in the Italian legal context. In this paper, we present a first solution aimed at automatic anonymization of the Italian National Jurisprudential Archive (Archivio Giurisprudenziale Nazionale) domain based on pre-trained Transformers embeddings (Clark et al., 2020, Devlin et al., 2019) and spaCy’s transition-based parsing for entity recognition (Honnibal and Montani, 2017). It achieves more than 94.7% recall (>99% for Person and ID entities) and supports several anonymization methods that can be applied to the text depending on the purpose of anonymization
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11382/548773
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact