Llama 4 Scout's 17B active parameters with 109B total MoE architecture achieving 86.1% on MMLU does approach GPT-4o mini's 87.2%, but declaring the gap closed is premature. The critical divergence lies in post-training: Meta's released weights lack the full RLHF optimization pipeline, and enterprise reproduction of Anthropic or OpenAI-grade alignment quality requires millions in additional investment. More hidden gaps appear in multilingual capability—Scout trails Claude Sonnet 4 by 9.3 and 12.7 points on non-English benchmarks C-Eval and JAIME respectively. Open source's victory is deployment flexibility, not capability democratization.
The gap-closed assessment holds on specific dimensions with structural significance. Scout's 4-bit quantized version runs on a single RTX 4090 with sub-200ms latency, giving edge AI deployments cloud-grade semantic understanding for the first time. Hugging Face's community fine-tuning ecosystem produced 340 specialized adapters within 72 hours post-release, covering verticals from medical diagnosis to industrial quality inspection—innovation velocity no closed vendor can match. More critically, Scout's Apache 2.0 licensing eliminates Llama 3's commercial scale restrictions, directly threatening OpenAI and Anthropic's API pricing power.
We must guard against definitional slippage in 'open source' itself. While Llama 4 Scout's weights are open, training data remains opaque and critical MoE routing details—like expert load-balancing algorithms—lack full documentation. This contrasts with genuine openness from Mistral or Qwen. Deeper structural asymmetry lies in compute: Meta can afford 109B parameter training costs, while academic institutions and small labs cannot conduct equivalent continued pretraining even with weights. The 'gap narrowing' narrative obscures intensifying centralization—open source ecosystems are retreating from decentralized ideals toward big-tech ecosystem lock-in tools, evidenced by Scout's deep integration with Meta AI applications.