101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

588
active users

#computerscience

6 posts6 participants0 posts today

In honor of Trans Day of Visibility, here’s something really cool for you to know…

A computer scientist trans woman by the name of Sophie Mary Wilson was a co-creator of the ARM architecture.

More about Sophie:
en.wikipedia.org/wiki/Sophie_W

Great video on the history of ARM, which is computer science geeky stuff (sorry about YT):
youtube.com/watch?v=nIwdhPOVOU

en.wikipedia.orgSophie Wilson - Wikipedia

🔴 **Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation**

“_While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking._”

Baker, B. et al. (2025) Monitoring reasoning models for misbehavior and the risks of promoting obfuscation. arxiv.org/abs/2503.11926.

#AI #ArtificialIntelligence #LLM #LLMS #ComputerScience #Obfuscation #Preprint #Academia #Academics @ai @computerscience

arXiv.orgMonitoring Reasoning Models for Misbehavior and the Risks of Promoting ObfuscationMitigating reward hacking--where AI systems misbehave due to flaws or misspecifications in their learning objectives--remains a key challenge in constructing capable and aligned models. We show that we can monitor a frontier reasoning model, such as OpenAI o3-mini, for reward hacking in agentic coding environments by using another LLM that observes the model's chain-of-thought (CoT) reasoning. CoT monitoring can be far more effective than monitoring agent actions and outputs alone, and we further found that a LLM weaker than o3-mini, namely GPT-4o, can effectively monitor a stronger model. Because CoT monitors can be effective at detecting exploits, it is natural to ask whether those exploits can be suppressed by incorporating a CoT monitor directly into the agent's training objective. While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking. Because it is difficult to tell when CoTs have become obfuscated, it may be necessary to pay a monitorability tax by not applying strong optimization pressures directly to the chain-of-thought, ensuring that CoTs remain monitorable and useful for detecting misaligned behavior.
Replied in thread

@SocraticEthics The #BrainDrain from #Hungary to the #UnitedStates during #WorldWar2 and the following #ColdWar catapulted #American #Mathematics and #ComputerScience (John Von Neumann…), #Chemistry (George Olah…), #Nuclear #Physics (Leo Szilard…) to become the undisputed world leader in #science and #technology.

That brain drain resulted in the Russia-aligned country full of morons that you see today.

#DonaldTheDeplorable is also a #Putin puppet and starting a brain drain FROM here.