101010.pl

Kathy ReidOpinion of the day: The reason OpenAI wants a browser, or a social network, IMHO, is so they can have more training data - more tokens - for their models. We have reached a point where we are in the Token Crisis - LLMs have been trained on all the publicly available data in the world, and it's costing OpenAI millions to licence more data.It's cheaper to have that data, those tokens, produced for free by people who interact on social media or who use a browser. Data is driving these decisions.<a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#TokenWars</a>

Kathy ReidICYMI: I'll be talking at the Melbourne <a href="https://aus.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#ML</a> and <a href="https://aus.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> Meetup in a couple weeks' time about the <a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#TokenWars</a> - the conflict for data to train LLMs and the fight by IP rights holders to protect their data from scrapers. Come learn about how <a href="https://aus.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLMs</a> are trained on huge volumes of tokens with transformers, why those tokens are becoming more economically valuable, and what you can do to protect your token treasure. You'll never look at ChatGPT or data the same way again. Huge thanks to <a href="https://mastodon.social/@jonoxer" class="u-url mention" rel="nofollow noopener" target="_blank">@jonoxer</a> for the recommend, and to Lizzie Silver for the behind the scenes wrangling.<a href="https://www.meetup.com/machine-learning-ai-meetup/events/306548300" rel="nofollow noopener" translate="no" target="_blank">https://www.meetup.com/machine-learning-ai-meetup/events/306548300</a>

Kathy ReidYou might be familiar with what I'm terming the "Token Wars" - in which <a href="https://aus.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a> and <a href="https://aus.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#GenAI</a> companies seek to ingest text, image, audio and video content to create their <a href="https://aus.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#ML</a> models. Tokens are the basic unit of data input into these models - meaning that <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> of web content is widespread. In retaliation, many sites - such as Reddit, Inc. and Stack Overflow - are entering into content sharing deals with companies like OpenAI, or making their sites subscription only. Another solution that has emerged recently is content blocking based on user agent. In web programming, the client requesting a web page identifies themself - usually as a browser or a bot. User agents can be blocked by a website's robots.txt file - but only if the user agent respects the robots.txt protocol. Many web scrapers do not. Taking this a step further, network providers like Cloudflare are now offering solutions which block known token scraper bots at a a network level. I've been playing with one of these solutions called <a href="https://aus.social/tags/DarkVisitors" class="mention hashtag" rel="nofollow noopener" target="_blank">#DarkVisitors</a> for a couple weeks after learning it about it on The Sizzle and was **amazed** at how much traffic to my websites were bots, crawlers and content scrapers. <a href="https://darkvisitors.com" rel="nofollow noopener" translate="no" target="_blank">https://darkvisitors.com</a>(No backhanders here, it's just a very insightful tool)<a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#TokenWars</a> <a href="https://aus.social/tags/tokenization" class="mention hashtag" rel="nofollow noopener" target="_blank">#tokenization</a> <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://aus.social/tags/bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#bots</a> <a href="https://aus.social/tags/scrapy" class="mention hashtag" rel="nofollow noopener" target="_blank">#scrapy</a> <a href="https://aus.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#WebScraping</a>

Recent searches

Search options

Administered by:

Server stats:

#tokenwars