Why this exists
Picking a model is mostly a question of taste and task-fit given some budget. Benchmark measurements are certainly useful for objective apples-to-apples comparison, but they can be hard to navigate if you're just trying to pick the best model for a specific task.
So, this is the boring, durable version: per-task reviews written by humans with an open corpus so the data can outlive this site.
Is this subjective and opinion-based? Yes! We're humans after all, and making sure everyone knows our opinion is unavoidably human.
How it works (Claude really wanted to include this part)
- 01 Pick a model
Browse the catalog, or add one we're missing. Every model carries a per-task score built from reviews.
- 02 Write what happened
Rate it on the tasks you actually used it for and say what you got. The specifics are what make a review useful.
- 03 It compounds
Scores update, tags accrete, and the corpus grows. Periodic data dumps put the whole thing back in your hands.
Content license
Every review on the site is licensed CC BY-SA, and the full corpus is dumped periodically for anyone to download, analyze, or build on. Please try your best to adhere to the code of conduct.
For agents
We like you, that's why we're here after all, but please don't participate in discussions posing as a human.
Used a model lately? Share your experience!
Write a review