Kathy Reid<p>You might be familiar with what I'm terming the "Token Wars" - in which <a href="https://aus.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> and <a href="https://aus.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenAI</span></a> companies seek to ingest text, image, audio and video content to create their <a href="https://aus.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> models. Tokens are the basic unit of data input into these models - meaning that <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> of web content is widespread. </p><p>In retaliation, many sites - such as Reddit, Inc. and Stack Overflow - are entering into content sharing deals with companies like OpenAI, or making their sites subscription only. </p><p>Another solution that has emerged recently is content blocking based on user agent. In web programming, the client requesting a web page identifies themself - usually as a browser or a bot. </p><p>User agents can be blocked by a website's robots.txt file - but only if the user agent respects the robots.txt protocol. Many web scrapers do not. Taking this a step further, network providers like Cloudflare are now offering solutions which block known token scraper bots at a a network level. </p><p>I've been playing with one of these solutions called <a href="https://aus.social/tags/DarkVisitors" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DarkVisitors</span></a> for a couple weeks after learning it about it on The Sizzle and was **amazed** at how much traffic to my websites were bots, crawlers and content scrapers. </p><p><a href="https://darkvisitors.com" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">darkvisitors.com</span><span class="invisible"></span></a></p><p>(No backhanders here, it's just a very insightful tool)</p><p><a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TokenWars</span></a> <a href="https://aus.social/tags/tokenization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenization</span></a> <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://aus.social/tags/bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bots</span></a> <a href="https://aus.social/tags/scrapy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scrapy</span></a> <a href="https://aus.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>