Do the leaked GPT-5.4 benchmarks signal that LLM capability growth is hitting a ceiling?