Latest reviews

We put reviewers through a rigorous gauntlet designed to permit only human-written content: writing a paragraph without using a single em-dash. This list is the review feed, newest first.

Used a model lately? Write a review

The feed

5 reviews

5 models covered

1 helpful vote

rated 8.5 out of 10

Claude Opus 4.8
Not much different at coding than Opus 4.7

After probably several hundreds of hours programming with Opus 4.6 and 4.7, the jump to 4.8 seems almost unnoticeable to me. In general both models are extremely capable of executing well-scoped tasks from honestly pretty under-specified prompts. I also have found Opus 4.8 (and 4.7) to be exceptionally useful at…

m.r | Editorial Team

6 weeks ago 0 found helpful 0 comments
rated 8.0 out of 10

ChatGPT 5.5
Good at writing dialogue

From our own experience, creative writing can share a lot of characteristics with writing code. The expressiveness of the medium, and ambiguity on what makes "good" writing or code. We found Anthropic's Fable 5 to be somewhat lacking on the creative writing front, so we wanted to see how GPT-5.5…

m.r | Editorial Team

6 weeks ago 0 found helpful 0 comments
rated 7.0 out of 10

Claude Fable 5
Creative output on the whole; not hype-worthy

Anthropic announced Fable 5 a day or two ago with fanfare as their new frontier-defining model. To try something a bit different, we used Fable 5 for one of our favorite pastimes: entertainment through novel creative writing. Given a simple prompt, Fable 5 followed instructions pretty well (narrative direction and…

m.r | Editorial Team

6 weeks ago 0 found helpful 0 comments
rated 9.0 out of 10

Gemini 3 Flash
Excellent job learning a new DSL

We gave Gemini a language spec for a DSL that was designed specifically for LLMs to generate plus a textual description of what we wanted it to write. The DSL is specific to our app, and therefore no model has it in training data. While a little slow (maybe ~30s…

m.r | Editorial Team

6 weeks ago 0 found helpful 0 comments
rated 7.5 out of 10

Claude Sonnet 4.6
Surprisingly good design judgement

Fast & cost effective for generating visuals from short prompts. We used Sonnet as part of a two-pass LLM pipeline for taking a short user-provided prompt describing a graphic (e.g. "lifecycle of a cup of coffee" or "project timeline with three lanes over two years, segmented by quarter") and producing…

m.r | Editorial Team

6 weeks ago 1 found helpful 0 comments

Latest reviews

Not much different at coding than Opus 4.7

Good at writing dialogue

Creative output on the whole; not hype-worthy

Excellent job learning a new DSL

Surprisingly good design judgement