Training on only perfect Standard English cor- pora predisposes pre-trained neural networks to discriminate against minorities from non- standard linguistic backgrounds. We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popu- lar models, e.g., BERT and Transformer, and show that adversarially finetuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
Samson Tan, Shafiq Joty, Min-Yen Kan, and Richard Socher. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20) , pages 2920–-2935, 2020.
PDF Abstract BibTex Slides