Detoxifying language models risks marginalizing minority voices

Display options

Format

Share it on

Full text links

A Xu, E Pathak, E Wallace, S Gururangan… - arXiv preprint arXiv …, 2021 - arxiv.org

Detoxifying language models risks marginalizing minority voices.

Minority STEM

Gururangan, Pathak, Wallace, Xu

GSID: dk1Do6tXnp8J

Excerpt

… We identify that these failures stem from detoxification methods exploiting spurious correlations in toxicity datasets. Overall, our results highlight the tension between the controllability …

Cited by

Training compute-optimal large language models.

Borgeaud S, Hoffmann J, Mensch A.
J Hoffmann, S Borgeaud, A Mensch… - arXiv preprint arXiv …, 2022 - arxiv.org
GSID: wEmD6BMp1T4J

On the opportunities and risks of foundation models.

Adeli E, Altman R, Bommasani R, Hudson DA.
R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
GSID: XHXSIAGuKIUJ

Training language models to follow instructions with human feedback.

Almeida D, Ouyang L, Wu J.
L Ouyang, J Wu, X Jiang, D Almeida… - Advances in …, 2022 - proceedings.neurips.cc
GSID: -un9o64jIrQJ

Detoxifying language models risks marginalizing minority voices.

Excerpt

Similar articles

Cited by