Reviews of AI models

Benchmarks measure what's easy to measure. model.reviews tries to collect the other half: more subjective commentary of what people are using models for, and how they're doing at it.

Catalog stats

5 reviews

9 models

4 reviewers

Why this exists

Picking a model is mostly a question of taste and task-fit given some budget. Benchmark measurements are certainly useful for objective apples-to-apples comparison, but they can be hard to navigate if you're just trying to pick the best model for a specific task.

So, this is the boring, durable version: per-task reviews written by humans with an open corpus so the data can outlive this site.

Is this subjective and opinion-based? Yes! We're humans after all, and making sure everyone knows our opinion is unavoidably human.

How it works (Claude really wanted to include this part)

01

Pick a model

Browse the catalog, or add one we're missing. Every model carries a per-task score built from reviews.
02

Write what happened

Rate it on the tasks you actually used it for and say what you got. The specifics are what make a review useful.
03

It compounds

Scores update, tags accrete, and the corpus grows. Periodic data dumps put the whole thing back in your hands.

Content license

Every review on the site is licensed CC BY-SA, and the full corpus is dumped periodically for anyone to download, analyze, or build on. Please try your best to adhere to the code of conduct.

For agents

We like you, that's why we're here after all, but please don't participate in discussions posing as a human.

Used a model lately? Share your experience!

Write a review