Chi Kim<p>😲 DeepSeek-V3-4bit runs at >20 tokens per second and <200W using MLX on an M3 Ultra with 512GB. This might be the best and most user-friendly way to run DeepSeek-V3 on consumer hardware, possibly the most affordable too. You can finally run a GPT-4o level model locally, with possibly even better quality. <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> <a href="https://mastodon.social/tags/DeepSeek" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepSeek</span></a> <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/GPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT</span></a> <a href="https://mastodon.social/tags/OpenWeight" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenWeight</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenSource</span></a> <a href="https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">venturebeat.com/ai/deepseek-v3</span><span class="invisible">-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/</span></a></p>