Copyright lawsuits against Meta and OpenAI mention shadow libraries, including Library Genesis, as sources of training data

ancuuiqter@lemmy.world · 3 years ago

Copyright lawsuits against Meta and OpenAI mention shadow libraries, including Library Genesis, as sources of training data

ancuuiqter@lemmy.world · 3 years ago

Mentioning this since the project Anna’s Archive compiles several datasets and their corresponding torrents.

Anna’s Archive, whose aim is to “archive all the books in the world, and make them widely accessible,” pulls from a number of shadow library sources; the project provides its own torrent links (via Tor) for Library Genesis, Z-lib, Internet Archive, among others, plus Library Genesis’s torrents. In the datasets linked below, you can click on a given source and find its onion site or the torrents provided by the shadow library itself (in the case of Library Genesis, for example).

Anna’s Archive datasets

…almost all files shown on Anna’s Archive are available through torrents. Below is a list of the different data sources that we use, with links to their torrents. Our own torrents are available on Tor.

Sources include

Internet Archive Digital Lending Library
Libgen.li comics
Z-Library scrape
ISBNdb scrape
Libgen auxiliary data
Libgen.rs
Libgen.li (includes Sci-Hub)

crunchpaste@lemmy.dbzer0.com · 3 years ago

Thanks a lot.

Copyright lawsuits against Meta and OpenAI mention shadow libraries, including Library Genesis, as sources of training data

Copyright lawsuits against Meta and OpenAI mention shadow libraries, including Library Genesis, as sources of training data

archive.md