Detoxifying language models risks marginalizing minority voices. Gururangan S, Pathak E, Wallace E, Xu A. A Xu, E Pathak, E Wallace, S Gururangan… - arXiv preprint arXiv …, 2021 - arxiv.org GSID: dk1Do6tXnp8J
Exploring the limits of domain-adaptive training for detoxifying large-scale language models. Patwary M, Ping W, Wang B, Xiao C, Xu P. B Wang, W Ping, C Xiao, P Xu, M Patwary… - arXiv preprint arXiv …, 2022 - arxiv.org GSID: UMyjKs0gz78J
Challenges in detoxifying language models. Dathathri S, Glaese A, Uesato J, Welbl J. J Welbl, A Glaese, J Uesato, S Dathathri… - arXiv preprint arXiv …, 2021 - arxiv.org GSID: KnwFnSDQYnwJ
Training compute-optimal large language models. Borgeaud S, Hoffmann J, Mensch A. J Hoffmann, S Borgeaud, A Mensch… - arXiv preprint arXiv …, 2022 - arxiv.org GSID: wEmD6BMp1T4J
On the opportunities and risks of foundation models. Adeli E, Altman R, Bommasani R, Hudson DA. R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org GSID: XHXSIAGuKIUJ
Training language models to follow instructions with human feedback. Almeida D, Ouyang L, Wu J. L Ouyang, J Wu, X Jiang, D Almeida… - Advances in …, 2022 - proceedings.neurips.cc GSID: -un9o64jIrQJ