Arena AI Model ELO History: A Live Tracker!
The article discusses the evolution of large language models (LLMs) using the Arena AI ELO rating system. This system captures human preference through pairwise comparisons, providing a nuanced view of model performance. The ELO system adjusts ratings based on contest outcomes, reflecting qualitative differences in model performance. The article proposes a pragmatic approach to visualizing LLM evolution by focusing on the peak performance of each major AI lab over time. This approach highlights generational leaps and periods of stagnation or decline.