Although the distinction between prediction and explanation is well established in the philosophy of science, statistical modeling techniques too often overlook the practical implications of such theoretical divergence. Can predictive and explanatory models be recognized as complements rather than substitutes? We argue that predictive and explanatory modeling need not be seen as in conflict: this two so far parallel approaches would largely benefit one from the other and the contamination between the two might be one of the central topics in statistical modeling in the years to come. Most importantly, we show that the need for this convergence is made apparent by the requirements imposed by the EU General DataProtection Regulation (GDPR), and it is of paramount importance when dealing with legal data. We also show how the demand to meaningfully clarify the logic behind solely automated decision-making processes creates a unique incentive to reconcile two seemingly contradictory scientific paradigms. In addition, by looking at 2585 Italian cases related to personal injury compensation, we develop a simple application to map the space of judges’ decisions and, using state-of-the-art multi-label algorithms, we classify such decisions according to the relevant heads of damages. As a matter of fact, drawing causal evidence from the analysis might be dangerous: if we want machines to improve human decisions, we need more robust, generalized, and explainable models.

Causality and Explanation in ML: a Lead from the GDPR and an Application to Personal Injury Damages

Giovanni Comandé;Denise Amram;
2019-01-01

Abstract

Although the distinction between prediction and explanation is well established in the philosophy of science, statistical modeling techniques too often overlook the practical implications of such theoretical divergence. Can predictive and explanatory models be recognized as complements rather than substitutes? We argue that predictive and explanatory modeling need not be seen as in conflict: this two so far parallel approaches would largely benefit one from the other and the contamination between the two might be one of the central topics in statistical modeling in the years to come. Most importantly, we show that the need for this convergence is made apparent by the requirements imposed by the EU General DataProtection Regulation (GDPR), and it is of paramount importance when dealing with legal data. We also show how the demand to meaningfully clarify the logic behind solely automated decision-making processes creates a unique incentive to reconcile two seemingly contradictory scientific paradigms. In addition, by looking at 2585 Italian cases related to personal injury compensation, we develop a simple application to map the space of judges’ decisions and, using state-of-the-art multi-label algorithms, we classify such decisions according to the relevant heads of damages. As a matter of fact, drawing causal evidence from the analysis might be dangerous: if we want machines to improve human decisions, we need more robust, generalized, and explainable models.
File in questo prodotto:
File Dimensione Formato  
2019_DataScienceForLaw.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print/Accepted manuscript
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 1.96 MB
Formato Adobe PDF
1.96 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/533342
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact