101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

523
active users

#tokenwars

0 posts0 participants0 posts today
Kathy Reid<p>Opinion of the day: </p><p>The reason OpenAI wants a browser, or a social network, IMHO, is so they can have more training data - more tokens - for their models. </p><p>We have reached a point where we are in the Token Crisis - LLMs have been trained on all the publicly available data in the world, and it's costing OpenAI millions to licence more data.</p><p>It's cheaper to have that data, those tokens, produced for free by people who interact on social media or who use a browser. Data is driving these decisions.</p><p><a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TokenWars</span></a></p>
Kathy Reid<p>ICYMI: I'll be talking at the Melbourne <a href="https://aus.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> and <a href="https://aus.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> Meetup in a couple weeks' time about the <a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TokenWars</span></a> - the conflict for data to train LLMs and the fight by IP rights holders to protect their data from scrapers. </p><p>Come learn about how <a href="https://aus.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> are trained on huge volumes of tokens with transformers, why those tokens are becoming more economically valuable, and what you can do to protect your token treasure. </p><p>You'll never look at ChatGPT or data the same way again. </p><p>Huge thanks to <span class="h-card" translate="no"><a href="https://mastodon.social/@jonoxer" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>jonoxer</span></a></span> for the recommend, and to Lizzie Silver for the behind the scenes wrangling.</p><p><a href="https://www.meetup.com/machine-learning-ai-meetup/events/306548300" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">meetup.com/machine-learning-ai</span><span class="invisible">-meetup/events/306548300</span></a></p>
Kathy Reid<p>You might be familiar with what I'm terming the "Token Wars" - in which <a href="https://aus.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> and <a href="https://aus.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenAI</span></a> companies seek to ingest text, image, audio and video content to create their <a href="https://aus.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> models. Tokens are the basic unit of data input into these models - meaning that <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> of web content is widespread. </p><p>In retaliation, many sites - such as Reddit, Inc. and Stack Overflow - are entering into content sharing deals with companies like OpenAI, or making their sites subscription only. </p><p>Another solution that has emerged recently is content blocking based on user agent. In web programming, the client requesting a web page identifies themself - usually as a browser or a bot. </p><p>User agents can be blocked by a website's robots.txt file - but only if the user agent respects the robots.txt protocol. Many web scrapers do not. Taking this a step further, network providers like Cloudflare are now offering solutions which block known token scraper bots at a a network level. </p><p>I've been playing with one of these solutions called <a href="https://aus.social/tags/DarkVisitors" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DarkVisitors</span></a> for a couple weeks after learning it about it on The Sizzle and was **amazed** at how much traffic to my websites were bots, crawlers and content scrapers. </p><p><a href="https://darkvisitors.com" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">darkvisitors.com</span><span class="invisible"></span></a></p><p>(No backhanders here, it's just a very insightful tool)</p><p><a href="https://aus.social/tags/TokenWars" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TokenWars</span></a> <a href="https://aus.social/tags/tokenization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenization</span></a> <a href="https://aus.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://aus.social/tags/bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bots</span></a> <a href="https://aus.social/tags/scrapy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scrapy</span></a> <a href="https://aus.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>