Interpretability and Analysis of Models for NLP
An in-depth look at interpretability and analysis of models for NLP (ACL 2020).
interpretability natural-language-processing acl-2020 article research

Evaluating faithful interpretability

Guidelines we should keep in mind when working on faithful interpretations from Jacovi & Goldberg 2020 (talk’s slides, number 16–24) 1. Faithfulness is not Plausibility. A plausible but unfaithful interpretation is akin to lying, and can be dangerous. 2. A model decision process is not a human decision process. Humans cannot judge if an interpretation is faithful. Evaluating interpretation using human input is evaluating plausibility, not faithfulness. 3. Claims are just claims until tested. A model which is believed to be “inherently interpretable” should be rigorously tested in just the same way as post-hoc methods.

Interpretability via different methods

  1. Attention as interpretation
  2. Probing
  3. Rationales for Faithful Interpretations
  4. Explanation via Training Examples

and many other techniques for interpretability in NLP!

Don't forget to tag @carolinlawrence in your comment, otherwise they may not be notified.

Authors community post
Postdoctoral researcher at NEC Labs Europe.
Share this project
Similar projects
A Survey of the State of Explainable AI for NLP
Overview of the operations and explainability techniques currently available for generating explanations for NLP model predictions.
AllenNLP Interpret
A Framework for Explaining Predictions of NLP Models
Visualizing Memorization in RNNs
Inspecting gradient magnitudes in context can be a powerful tool to see when recurrent units use short-term or long-term contextual understanding.
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Top collections