Played yesterday with gpt-oss:20b on my home lab. Ollama by default uses 2K tokens as context so those are used up quite quickly since this is a reasoning model. Increased to 128K (enough to fit a whole book) but then it would take forever to start producing an answer. Setting it to 64K was a sweet spot, and now it's quick, connected online and has vision understanding. To test, added a pdf and asked questions about it's content and it worked flawlessly. Using 9070 16GB on a 32GB RAM machine.
Using Continue on VSCode, I keep my AI assistant in-house. No subscriptions, all here.