Pirate Libraries

Also known as shadow libraries, are potentially the best thing humans have ever created and are illegal in most places. They are archives human writing, some catering to particular niches and some attempting to capture everything. And then distribute all of human knowledge and writing for free.

They have no regard for copyright law, generally viewing them as part of a broken system or distractions to (what the pirate librarians might call) their noble goals.

Library Genesis (libgen) and Sci-Hub (a superset of libgen) focus primarily on academic materials, textbooks and journal publications. The projects are borne from the goal of allowing access to documents that would normally only be accessible through expensive journals.

Books3 is another pirate library, containing ≈196000 books sourced from Bibliotik. It is named in the spirit of OpenAI's dubiously sourced collections of books, Books1 and Books2. Books3 has become well known for its use in training large language models like LLaMA.

The Internet Archive is pirate library adjacent. It contains and redistributes (without permission) countless copyrighted works, but will respond to takedown requests.

During my time at university, use of Library Genesis and Sci-Hub was rife. While not everyone knew of these sites directly, everyone knew someone who knew someone who did, and textbook PDFs propagated on the sneakernet.

Pirate libraries:

An adjacent concept is preservation of media, where pirated copies are seen as a defence against permanent loss. This is the thinking behind the Internet Archive and the sites it has inspired. This kind of thinking is also prevalent in the world of video games, a medium where many works are rendered unavailable by missing online services, changing hardware landscape or lost source code. There are many cases where old games have been recovered or restored through pirated versions.

Web archives:


Posted: 2023-11-20 Last update: 2024-04-26 11:11:48 +1000