AI Scraping Ruins Everything, Reddit Now Has To Block Internet Archive Indexing

0
5


This Is Why We Can’t Have Nice Things

Reddit has been quite successful at preventing the hordes of data harvesters AI companies use to raid the intellectual property of anyone who dares have a presence on the internet.  That cannot be said of every organization on the web unfortunately, as you can’t nicely ask an AI scraper to leave your data alone.  If it can see the data it will grab it as these bots have no concept of private versus public data, as has been demonstrated over and over again.   It seems that the data on Internet Archive, specifically the Wayback Machine, is one set of data that is being wantonly raided by AI scrapers.

This has led to Reddit blocking Internet Archive from archiving their threads, as they have noticed those threads now being used by LLMs after being harvested from the Wayback Machine.   Reddit has prevented their users posts being harvested directly but now they are being  grabbed from Internet Archive, and until that organization can prevent this you won’t find Reddit posts on the Wayback Machine.

This isn’t the only beef Reddit has with Internet Archive’s processes, they also don’t appreciate the fact that posts deleted from Reddit aren’t removed from the Wayback Machine.  This will also have to be addressed before you will see Reddit content on the Wayback Machine.



Source link