Arena AI Model ELO History

A live tracker was built to visualize the performance changes of flagship AI models over time. The tracker plots a continuous curve per major AI lab, showing sudden generational jumps and slow performance decays. However, the data only captures API benchmarks and not the 'nerfing' that occurs in consumer web UIs. The project is open-source and the author is seeking historical ELO or evaluation datasets that scrape or test outputs from consumer web UIs. This would provide a more accurate picture of the consumer experience.

Source →
FeedLens — Signal over noise Last 7 days