101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

483
active users

#benchmark

2 posts2 participants0 posts today

This is an interesting benchmark.

o1, the superduper model only completes 84.2% of the tasks in the test, 99.2% in correct format.. Qwen2.5-Coder-32B, a relatively small model which can run locally obtains 72.9% and 100.0%.
Yet another proof that LLM/transformers are not great : if a LLM cannot format the code correctly, and/or correct it itself to mark 100%, what is the use in this? Obviously, well, a new way to make money (tokens)..
aider.chat/docs/leaderboards/e