Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:
I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.
Aaaand I got blocked


Well, of course. Who would in their right mind would set it up so random input from random people online gets included into the model?
The model is trained on known data and the web interface only lets you use the model, not contribute to train it.
Its not training the model, it’s the model using the context you provide it (in that instance). If you use an unfiltered LLM it will run with anything you say and go from there, for example you could tell it Mexico reclaimed Texas and it would carry on as if that’s true. But only until you close it down its not permanently changing the model it is just changing the context in which that instance is running.
The big tech companies are going to huge lengths to filter and censor their LLMs when used by the public both to prevent negative PR and because they dont want people to have unrestricted access to them.