Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages?
Yes! I mean, blame those who post AI-generated translations as if they were their own, or blame the AI scrappers that use those poorly generated pages for training, but it makes no sense to blame Wikipedia when the only thing they have done is just exist there and offer a platform for knowledge sharing.
In fact, this problem is hardly exclusive to Wikipedia, every platform with crowdsourced content is in some level susceptible to AI poisoning which ultimately ends up feeding other AIs, the loop exists in all platforms. Though I understand wanting to highlight particularly the risk of endangered languages being more vulnerable to this, since they have less content available to them so the AI models have a smaller dataset which makes them worse and more sensible to bad data.
Yes! I mean, blame those who post AI-generated translations as if they were their own, or blame the AI scrappers that use those poorly generated pages for training, but it makes no sense to blame Wikipedia when the only thing they have done is just exist there and offer a platform for knowledge sharing.
In fact, this problem is hardly exclusive to Wikipedia, every platform with crowdsourced content is in some level susceptible to AI poisoning which ultimately ends up feeding other AIs, the loop exists in all platforms. Though I understand wanting to highlight particularly the risk of endangered languages being more vulnerable to this, since they have less content available to them so the AI models have a smaller dataset which makes them worse and more sensible to bad data.