AI Scraping Ruins Everything, Reddit Now Has To Block Internet Archive Indexing
This Is Why We Can’t Have Nice Things
Reddit has been quite successful at preventing the hordes of data harvesters AI companies use to raid the intellectual property of anyone who dares have a presence on the internet. That cannot be said of every organization on the web unfortunately, as you can’t nicely ask an AI scraper to leave your data alone. If it can see the data it will grab it as these bots have no concept of private versus public data, as has been demonstrated over and over again. It seems that the data on Internet Archive, specifically the Wayback Machine, is one set of data that is being wantonly raided by AI scrapers.
This has led to Reddit blocking Internet Archive from archiving their threads, as they have noticed those threads now being used by LLMs after being harvested from the Wayback Machine. Reddit has prevented their users posts being harvested directly but now they are being grabbed from Internet Archive, and until that organization can prevent this you won’t find Reddit posts on the Wayback Machine.
This isn’t the only beef Reddit has with Internet Archive’s processes, they also don’t appreciate the fact that posts deleted from Reddit aren’t removed from the Wayback Machine. This will also have to be addressed before you will see Reddit content on the Wayback Machine.
Reddit is now blocking the Internet Archive (IA) from indexing popular Reddit threads after allegedly catching sneaky AI firms—restricted from scraping Reddit—instead simply scraping data from IA's archived content.
More Tech News From Around The Web
- Over 3,000 NetScaler devices left unpatched against CitrixBleed 2 bug @ Bleeping Computer
- Russia’s RomCom among those exploiting a WinRAR 0-day in highly-targeted attacks @ The Register
- Nvidia gives its tiniest workstation GPUs a Blackwell boost @ The Register
- NVIDIA RTX Pro 4000 SFF Blackwell Edition and RTX Pro 2000 Blackwell Announced @ ServeTheHome
- GitHub will be folded into Microsoft proper as CEO steps down @ Ars Technica
- Mozilla Under Fire For Firefox AI ‘Bloat’ That Blows Up CPU and Drains Battery @ Slashdot
- Perplexity Makes Longshot $34.5 Billion Offer for Chrome @ Slashdot
- Torvalds blasts tardy kernel dev: Your ‘garbage’ RISC-V patches are ‘making the world worse’ @ The Register


