從迷因到數學：AI 多模態能力全面測試，GPT-4o 登頂

專注於大型語言模型評估的研究組織 LMSYS 在 6 月 28 日推出了「多模態競技場」（Multimodal Arena），這是一個比較人工智慧模型在視覺相關任務表現的新排行榜。LMSYS是由來自加州大學伯克利分校和其他頂尖院校的研究人員組成的團隊，致力於開發開放的人工智慧評估平台。該競技場在短短兩週內收集了超過 17,000 個用戶偏好投票，涵蓋 60 多種語言，展示了當前人工智慧視覺處理能力的現狀。

OpenAI 的 GPT-4o 模型在多模態競技場中位居榜首，緊隨其後的是 Anthropic 的Claude 3.5 Sonnet 和 Google 的 Gemini 1.5 Pro。這個排名反映了科技巨頭在快速發展的多模態人工智慧領域的激烈競爭。

值得注意的是，開源模型 LLaVA-v1.6-34B 的得分可與一些專有模型（如Claude 3 Haiku）相媲美。這一發展預示著先進人工智慧能力可能走向民主化，有望為缺乏大型科技公司資源的研究人員和小型公司提供公平競爭的機會。

這個排行榜涵蓋了多種任務，包括圖像說明、數學問題解決、文件理解和迷因解讀。這種廣泛的任務範圍全面展示了每個模型的視覺處理能力，反映了現實世界應用的各種需求。

＊本文開放合作夥伴轉載，資料來源：《VentureBeat》首圖來源：《Unsplash》。

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

從迷因到數學：AI 多模態能力全面測試，GPT-4o 登頂

TO 會員電子報

供應鏈 AI 瓶頸不在技術，而在「決策權」：Deloitte 給主管的 5 個提醒

AI 會摧毀 SaaS，還是救回軟體公司？SAP 正在做一場關鍵實驗

Vitals ESP 7 重磅登場重塑知識管理新體驗

微軟砸 25 億美元、AWS 投 10 億美元派駐工程師：企業 AI 戰場為何從「單一模型」轉向「多模型調度」？

從迷因到數學：AI 多模態能力全面測試，GPT-4o 登頂

供應鏈 AI 瓶頸不在技術，而在「決策權」：Deloitte 給主管的 5 個提醒

AI 會摧毀 SaaS，還是救回軟體公司？SAP 正在做一場關鍵實驗

Vitals ESP 7 重磅登場 重塑知識管理新體驗

微軟砸 25 億美元、AWS 投 10 億美元派駐工程師：企業 AI 戰場為何從「單一模型」轉向「多模型調度」？

Vitals ESP 7 重磅登場重塑知識管理新體驗