Screen Scraping

July 3, 2023

Gizmodo has a piece on “Google Says It’ll Scrape Everything you Post Online for AI“:

One of the less obvious complications of the post ChatGPT world is the question of where data-hungry chatbots sourced their information. Companies including Google and OpenAI scraped vast portions of the internet to fuel their robot habits. It’s not at all clear that this is legal, and the next few years will see the courts wrestle with copyright questions that would have seemed like science fiction a few years ago. In the meantime, the phenomenon already affects consumers in some unexpected ways.

Twitter’s crazy rate-limiting meltdown and Reddit’s push to charge for API access are about one thing, AI data models. These systems are hungry for data, and access to that data will be vital to building the best AI models. Unsurprisingly, Google is making it known that as it ranks and offers prime search engine placement, all that delicious data is free game to them. When APIs become closed, people result to screen scrapping, and screen scrapping ends with paywalls and Twitter style rate-limiting… Wonder how this all plays out.

Uncategorized