AI 模型排行榜背後的真相：Chatbot Arena 透明度待加強

過去幾個月，科技界高層如 Elon Musk 不斷提到自家公司 AI 模型在 Chatbot Arena 的表現。這個由非營利組織 LMSYS 維護的平台，迅速成為業界焦點。Chatbot Arena 允許用戶隨機選擇兩個匿名模型提問，並對它們的回答進行投票評比。自推出以來，該平台已累積超過百萬組提問與回答，並吸引多家知名企業如 OpenAI、Google 和 Meta 參與。

然而，這些評比結果的可信度引發了討論。來自艾倫人工智慧研究所的研究員林禹臣指出，LMSYS 尚未完全公開其評估模型的具體能力與標準。此外，該平台的用戶多來自科技圈，提出的問題偏重於程式設計與 AI 工具等技術類型，難以反映一般用戶的需求。

林禹臣還提到，商業模式的介入可能使結果失去公正性。像 OpenAI 這樣的大公司可以通過 API 獲取大量用戶數據，進而優化模型表現，在評比中佔據優勢。此外，LMSYS 也接受企業與風險投資公司的贊助，這讓外界對其公正性提出質疑。

儘管如此，Chatbot Arena 仍提供了一個即時了解 AI 模型表現的平台。林禹臣認為，該平台更適合用來衡量用戶滿意度，而非作為 AI 技術水平的唯一標準。

*本文開放合作夥伴轉載。資料來源：《TechCrunch》，首圖來源：Unsplash。

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

AI 模型排行榜背後的真相：Chatbot Arena 透明度待加強

TO 會員電子報

摩根士丹利把對帳工時砍半：不是讓 AI Agent 更自主，而是讓它走進流程

世界盃背後的 AI 決策戰：從球員招募、戰術配置到小國突圍，數據如何改寫勝負？

AI 帳單燒出企業怒火：Palantir CEO 為何說 OpenAI、Anthropic 的 Token 計費模式「出大問題」？

Meta 傳搶進 AI Cloud 市場：AI 基礎建設競爭，開始比誰能把算力變生意