cross-posted from: https://lemmy.intai.tech/post/43759
cross-posted from: https://lemmy.world/post/949452
OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models


inal but i think it’s going to come down to the terms of service where the data was scraped from. If the terms say the stuff you post can be shared with third parties then they might not have a leg to stand on. Where it gets sketchy is if someone posted someone else’s work, then the original author had no say in it being shared with a third party, BUT, is that the fault of the third party or the service provider that shared it?
Also, if i were exposed to copyright material through some unauthorised person distributing it can i not summarize the information? I guess i don’t know enough about fair use to answer that.
The wording in the article says they are being sued for stealing their data, this seems like a stretch but i guess i’ll wait for more details of the case.
I agree with the terms of service bit, but the hard part is going through the tos for so many different sites. Sort like how some open source code bases can’t re-license a code base because it is impossible to get into contact with all the people who have contributed to the project over the years. Online platforms already have certain protections from their users posting illegal content to their sites. We will have to see if that is extended to these large language models. When it comes to free use, there is no such thing. Free use must be proven in court. Each and every time. There are no guidelines on what is and isn’t free use when it comes to word of law, so that can swing either way. Just my two cents on the matter. Also, (inal).
The thing is that the images are used to train a set of weights and biases; the training data isn’t distributed as part of the AI or as part of the software used to generate images.