I suspect that if employees at Meta who are tasked with hoovering up training data from everywhere they can find are just watching porn, it probably won’t go over well on their annual reviews.
I would give good odds that the human at Meta most-closely responsible for the BitTorrent download at issue probably has never even seen this particular torrent by name or URL. The scope of data involved in training is too large for direct human involvement. They probably did something along the lines of writing a bot in Python or similar to spider websites and feed every torrent it could find into a torrent downloader. That downloader’s output then gets dumped into some massive internal collection of data that gets used by some other team as part of the training process. The humans just create tools and set them in motion, never actually see the overwhelming majority of the data that they’re processing.
I think you might be partially wrong. For training to work, you need to feed tags and descriptions for each piece you feed, so models can weight that input to something when they’re asked to generate something. The data annotation and preparation is a big part of training, probably the most important one, and it’s manually done by humans, probably in third world countries like mine. It’s usually the big untold part of AI, the big human data entry part of it. I’m not entirely sure how it works for videos, but probably a lot of people were paid to watch porn and annotate all the videos by timestamp, to feed along with the video for the training to happen. AI is The Turk all the way down.
Fair enough. I will point out that for the context of my comment, this is probably functionally equivalent — that is, if one has a piece of software to walk the DHT and build a list of torrents on it, it’s probably still going to be done in a fully-automated fashion.
I once was let go from a job in favor of someone who was later found to be spending most of their shifts watching porn. I don’t know if they suffered consequences for that - they were offshore and had no on site supervision.
To be fair, that person was more qualified than I was at the time, but I didn’t watch porn on the job, so …
I suspect that if employees at Meta who are tasked with hoovering up training data from everywhere they can find are just watching porn, it probably won’t go over well on their annual reviews.
I would give good odds that the human at Meta most-closely responsible for the BitTorrent download at issue probably has never even seen this particular torrent by name or URL. The scope of data involved in training is too large for direct human involvement. They probably did something along the lines of writing a bot in Python or similar to spider websites and feed every torrent it could find into a torrent downloader. That downloader’s output then gets dumped into some massive internal collection of data that gets used by some other team as part of the training process. The humans just create tools and set them in motion, never actually see the overwhelming majority of the data that they’re processing.
I think you might be partially wrong. For training to work, you need to feed tags and descriptions for each piece you feed, so models can weight that input to something when they’re asked to generate something. The data annotation and preparation is a big part of training, probably the most important one, and it’s manually done by humans, probably in third world countries like mine. It’s usually the big untold part of AI, the big human data entry part of it. I’m not entirely sure how it works for videos, but probably a lot of people were paid to watch porn and annotate all the videos by timestamp, to feed along with the video for the training to happen. AI is The Turk all the way down.
It won’t get everything but they don’t even have to write a program cause DHT
Fair enough. I will point out that for the context of my comment, this is probably functionally equivalent — that is, if one has a piece of software to walk the DHT and build a list of torrents on it, it’s probably still going to be done in a fully-automated fashion.
I use bitmagnet, but it just scrapes the metadata
Yes thank you, it was a joke. Obviously I am aware nobody is being paid to watch porn.
I once was let go from a job in favor of someone who was later found to be spending most of their shifts watching porn. I don’t know if they suffered consequences for that - they were offshore and had no on site supervision.
To be fair, that person was more qualified than I was at the time, but I didn’t watch porn on the job, so …