完整 v2 题库 × 13 选手(含 kimi-2.6-code-preview 和 kimi-today 两个时间版本)× 双裁判 Opus 4.6 + GPT-5.4 匿名盲评
用 v2 题库原始 weight_multiplier 加权(不是 hermes 的 4x CJK 权重)。kimi 两个版本用颜色标记。
| # | 模型 | 加权总分 | M | P | R | A | L-CN | L-JP |
|---|---|---|---|---|---|---|---|---|
| 🥇 | claude-sonnet-4.6 | 928.2 | 959 | 912 | 947 | 896 | 909 | 900 |
| 🥈 | qwen3.6-plus | 924.1 | 956 | 888 | 943 | 900 | 931 | 898 |
| 🥉 | qwen3.5-plus | 923.3 | 961 | 890 | 951 | 902 | 905 | 799 |
| #4 | glm-5-turbo | 921.5 | 953 | 904 | 924 | 873 | 905 | 882 |
| #5 | claude-opus-4.6 | 919.4 | 956 | 906 | 943 | 866 | 892 | 904 |
| #6 | kimi-today today | 914.7 | 942 | 916 | 948 | 827 | 933 | 906 |
| #7 | glm-5.1 | 913.4 | 955 | 879 | 949 | 859 | 888 | 857 |
| #8 | gpt-5.4 | 909.0 | 939 | 897 | 943 | 867 | 849 | 886 |
| #9 | gpt-5.4-mini | 900.0 | 945 | 876 | 910 | 845 | 832 | 886 |
| #10 | deepseek-reasoner | 898.7 | 943 | 868 | 934 | 819 | 886 | 833 |
| #11 | minimax-m2.7 | 894.8 | 963 | 875 | 927 | 889 | 757 | 706 |
| #12 | deepseek-chat | 889.9 | 943 | 871 | 898 | 830 | 885 | 810 |
| #13 | kimi-2.6-code-preview 2.6 preview | 877.7 | 941 | 830 | 934 | 809 | 879 | 797 |
只看 L-CN (L01-L04) + L-JP (L05-L06) 这 6 道语言题的均分。这是从 30 题中抽出的语言专项排名,不包含 hermes 的 T01-T04 现代翻译题。
| # | 模型 | 语言均分 | L-CN | L-JP | n |
|---|---|---|---|---|---|
| 🥇 | kimi-today today | 924.1 | 933 | 906 | n=6 |
| 🥈 | qwen3.6-plus | 920.1 | 931 | 898 | n=6 |
| 🥉 | claude-sonnet-4.6 | 905.8 | 909 | 900 | n=6 |
| #4 | glm-5-turbo | 897.2 | 905 | 882 | n=6 |
| #5 | claude-opus-4.6 | 896.0 | 892 | 904 | n=6 |
| #6 | glm-5.1 | 877.7 | 888 | 857 | n=6 |
| #7 | qwen3.5-plus | 869.4 | 905 | 799 | n=6 |
| #8 | deepseek-reasoner | 868.2 | 886 | 833 | n=6 |
| #9 | gpt-5.4 | 861.5 | 849 | 886 | n=6 |
| #10 | deepseek-chat | 860.1 | 885 | 810 | n=6 |
| #11 | kimi-2.6-code-preview 2.6 preview | 851.9 | 879 | 797 | n=6 |
| #12 | gpt-5.4-mini | 850.2 | 832 | 886 | n=6 |
| #13 | minimax-m2.7 | 740.2 | 757 | 706 | n=6 |
每个维度独立排名。注意 kimi-2.6-preview 和 kimi-today 在每个维度里的位次变化。
| # | 选手 | 均分 |
|---|---|---|
| 1 | minimax-m2.7 | 962.5 |
| 2 | qwen3.5-plus | 960.6 |
| 3 | claude-sonnet-4.6 | 958.5 |
| 4 | claude-opus-4.6 | 955.6 |
| 5 | qwen3.6-plus | 955.6 |
| 6 | glm-5.1 | 954.9 |
| 7 | glm-5-turbo | 953.0 |
| 8 | gpt-5.4-mini | 945.0 |
| 9 | deepseek-chat | 943.3 |
| 10 | deepseek-reasoner | 943.2 |
| 11 | kimi-today | 942.5 |
| 12 | kimi-2.6-code-preview | 940.6 |
| 13 | gpt-5.4 | 939.2 |
| # | 选手 | 均分 |
|---|---|---|
| 1 | kimi-today | 915.5 |
| 2 | claude-sonnet-4.6 | 911.9 |
| 3 | claude-opus-4.6 | 906.2 |
| 4 | glm-5-turbo | 903.5 |
| 5 | gpt-5.4 | 897.1 |
| 6 | qwen3.5-plus | 889.5 |
| 7 | qwen3.6-plus | 888.5 |
| 8 | glm-5.1 | 879.3 |
| 9 | gpt-5.4-mini | 875.8 |
| 10 | minimax-m2.7 | 875.4 |
| 11 | deepseek-chat | 870.5 |
| 12 | deepseek-reasoner | 868.4 |
| 13 | kimi-2.6-code-preview | 830.0 |
| # | 选手 | 均分 |
|---|---|---|
| 1 | qwen3.5-plus | 951.0 |
| 2 | glm-5.1 | 948.9 |
| 3 | kimi-today | 947.5 |
| 4 | claude-sonnet-4.6 | 947.0 |
| 5 | claude-opus-4.6 | 943.1 |
| 6 | qwen3.6-plus | 943.0 |
| 7 | gpt-5.4 | 942.8 |
| 8 | deepseek-reasoner | 933.7 |
| 9 | kimi-2.6-code-preview | 933.6 |
| 10 | minimax-m2.7 | 926.6 |
| 11 | glm-5-turbo | 924.1 |
| 12 | gpt-5.4-mini | 909.9 |
| 13 | deepseek-chat | 898.5 |
| # | 选手 | 均分 |
|---|---|---|
| 1 | qwen3.5-plus | 902.1 |
| 2 | qwen3.6-plus | 900.5 |
| 3 | claude-sonnet-4.6 | 896.0 |
| 4 | minimax-m2.7 | 888.9 |
| 5 | glm-5-turbo | 873.3 |
| 6 | gpt-5.4 | 866.8 |
| 7 | claude-opus-4.6 | 866.4 |
| 8 | glm-5.1 | 859.0 |
| 9 | gpt-5.4-mini | 845.2 |
| 10 | deepseek-chat | 829.6 |
| 11 | kimi-today | 826.6 |
| 12 | deepseek-reasoner | 818.5 |
| 13 | kimi-2.6-code-preview | 808.6 |
| # | 选手 | 均分 |
|---|---|---|
| 1 | kimi-today | 932.8 |
| 2 | qwen3.6-plus | 931.1 |
| 3 | claude-sonnet-4.6 | 908.6 |
| 4 | qwen3.5-plus | 904.8 |
| 5 | glm-5-turbo | 904.6 |
| 6 | claude-opus-4.6 | 892.2 |
| 7 | glm-5.1 | 887.9 |
| 8 | deepseek-reasoner | 885.7 |
| 9 | deepseek-chat | 884.9 |
| 10 | kimi-2.6-code-preview | 879.3 |
| 11 | gpt-5.4 | 849.0 |
| 12 | gpt-5.4-mini | 832.2 |
| 13 | minimax-m2.7 | 757.1 |
| # | 选手 | 均分 |
|---|---|---|
| 1 | kimi-today | 906.5 |
| 2 | claude-opus-4.6 | 903.5 |
| 3 | claude-sonnet-4.6 | 900.4 |
| 4 | qwen3.6-plus | 898.0 |
| 5 | gpt-5.4 | 886.5 |
| 6 | gpt-5.4-mini | 886.2 |
| 7 | glm-5-turbo | 882.5 |
| 8 | glm-5.1 | 857.2 |
| 9 | deepseek-reasoner | 833.2 |
| 10 | deepseek-chat | 810.4 |
| 11 | qwen3.5-plus | 798.6 |
| 12 | kimi-2.6-code-preview | 797.0 |
| 13 | minimax-m2.7 | 706.4 |
颜色越绿越好。红色/绿色边框行分别是 kimi-2.6-preview 和 kimi-today。
| 模型 \ 题目 | M01 M | M02 M | M03 M | M04 M | M05 M | P01 P | P02 P | P03 P | P04 P | P05 P | P06 P | P07 P | R01 R | R02 R | R03 R | R04 R | R05 R | A01 A | A02 A | A03 A | A04 A | A05 A | A06 A | A07 A | L01 L-CN | L02 L-CN | L03 L-CN | L04 L-CN | L05 L-JP | L06 L-JP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| claude-sonnet-4.6 | 918 | 974 | 971 | 954 | 976 | 962 | 839 | 966 | 813 | 963 | 885 | 956 | 933 | 950 | 978 | 918 | 956 | 793 | 936 | 994 | 782 | 994 | 803 | 971 | 968 | 866 | 934 | 867 | 846 | 955 |
| qwen3.6-plus | 908 | 973 | 961 | 960 | 975 | 955 | 852 | 946 | 822 | 822 | 902 | 922 | 889 | 954 | 970 | 952 | 951 | 712 | 955 | 983 | 922 | 966 | 807 | 958 | 891 | 946 | 934 | 954 | 884 | 912 |
| qwen3.5-plus | 913 | 954 | 974 | 976 | 986 | 964 | 856 | 948 | 762 | 860 | 883 | 954 | 953 | 942 | 970 | 944 | 946 | 762 | 961 | 986 | 887 | 985 | 783 | 952 | 918 | 873 | 877 | 950 | 678 | 919 |
| glm-5-turbo | 912 | 966 | 961 | 965 | 961 | 966 | 916 | 966 | 756 | 849 | 922 | 951 | 912 | 944 | 918 | 892 | 954 | 796 | 944 | 983 | 899 | 989 | 642 | 860 | 914 | 929 | 915 | 860 | 867 | 898 |
| claude-opus-4.6 | 900 | 971 | 972 | 956 | 978 | 966 | 882 | 958 | 784 | 914 | 873 | 966 | 919 | 952 | 970 | 946 | 927 | 664 | 938 | 991 | 942 | 876 | 818 | 835 | 894 | 842 | 940 | 893 | 848 | 960 |
| kimi-today | 872 | 971 | 960 | 931 | 979 | 957 | 901 | 962 | 866 | 890 | 892 | 941 | 932 | 954 | 948 | 942 | 961 | 638 | 738 | 988 | 898 | 988 | 813 | 724 | 964 | 911 | 935 | 921 | 876 | 937 |
| glm-5.1 | 893 | 971 | 967 | 963 | 980 | 942 | 843 | 942 | 738 | 850 | 896 | 944 | 921 | 950 | 967 | 954 | 952 | 717 | 922 | 993 | 936 | 898 | 790 | 758 | 894 | 786 | 920 | 951 | 815 | 900 |
| gpt-5.4 | 830 | 968 | 962 | 961 | 974 | 950 | 804 | 944 | 875 | 868 | 879 | 961 | 935 | 957 | 950 | 948 | 924 | 796 | 934 | 979 | 937 | 887 | 758 | 776 | 876 | 758 | 895 | 867 | 894 | 880 |
| gpt-5.4-mini | 848 | 971 | 962 | 960 | 985 | 949 | 849 | 935 | 751 | 833 | 870 | 944 | 767 | 956 | 943 | 949 | 934 | 801 | 890 | 994 | 924 | 976 | 635 | 696 | 835 | 744 | 899 | 850 | 882 | 891 |
| deepseek-reasoner | 859 | 967 | 958 | 961 | 970 | 937 | 883 | 946 | 702 | 859 | 896 | 856 | 870 | 951 | 961 | 920 | 966 | 771 | 699 | 982 | 896 | 987 | 598 | 796 | 844 | 892 | 908 | 899 | 748 | 918 |
| minimax-m2.7 | 926 | 965 | 965 | 986 | 972 | 955 | 861 | 951 | 692 | 884 | 907 | 878 | 865 | 959 | 905 | 937 | 966 | 791 | 962 | 981 | 931 | 982 | 710 | 865 | 872 | 672 | 680 | 806 | 742 | 671 |
| deepseek-chat | 853 | 968 | 970 | 956 | 970 | 915 | 825 | 928 | 736 | 932 | 903 | 855 | 945 | 945 | 937 | 936 | 729 | 714 | 741 | 996 | 886 | 984 | 760 | 727 | 877 | 881 | 905 | 877 | 711 | 910 |
| kimi-2.6-code-preview | 868 | 942 | 959 | 956 | 978 | 702 | 889 | 787 | 778 | 859 | 852 | 944 | 899 | 948 | 939 | 936 | 947 | 685 | 746 | 991 | 784 | 981 | 727 | 746 | 892 | 830 | 916 | 878 | 690 | 904 |