Built on real data.
Research-backed prompts.
We tested every hypothesis systematically. Here's what works.
Methodology
An automated evaluation framework tested every combination. No guesswork.
Test cases
50 real-world coding prompts across 5 domains.
36 prompting strategies
Each variant tested a different hypothesis.
Scoring
Every output scored by an independent AI judge across 5 dimensions, each rated 1–10.
Specificity
Are concrete details like language, framework, and scope added?
Intent Preservation
Is the user's original ask preserved without adding unwanted features?
Conciseness
Is every word earning its place?
Completeness
Are scope, constraints, and edge cases covered?
No Hallucination
Are invented requirements or context avoided?
Results
All 36 strategies ranked by average score. Range: 6.10 to 7.83.
Performance by domain
Average scores for the winning strategy across each domain.
Key insights
Temperature matters more than prompt engineering
Same prompt, different temperature. That alone produced the single biggest improvement of any change we tested.
Two examples beat zero. Four didn't help.
Zero-shot: 7.04. Two diverse examples: 7.77 (+10.4%). Adding more examples gave no further gain.
Guardrails prevent catastrophic failures
The worst outputs weren't mediocre. The AI asked questions instead of improving, or invented requirements. Two rules fixed both.
More instructions made it worse
The most detailed variant scored last (6.10). The winner: a 3-step method, strict guardrails, nothing else.
Persona framing has diminishing returns
"You ARE the AI coder" scored 6.41. A simple, credible persona works best.
There's a ceiling, and it's the starting prompt
No system prompt pushes past ~7.8. The algorithm can only work with what you give it. The real lever is the starting prompt itself.
The refined algorithm
The production algorithm combines the top-performing traits from our evaluation.
v1 → v2
- Anti-hallucination as rule #1: "Did the user imply this, or am I making it up?"
- 3-step methodology (WHAT/HOW/DONE): replaced the 6-step approach
- No-questions rule: improve, don't interrogate
- Sentence budget: 2–4 simple, 4–6 complex
- 2 diverse examples: down from 4