Colter Test
Run real AI shopping journeys against your store. 10 personas, 3 AI model families, actionable results.
TL;DR: Run
colter test https://your-store.comto send hosted AI shopping personas through your store and score the experience. Test runs on Colter's infrastructure and requiresCOLTER_API_KEYor a savedcolter auth loginsession. Add--jsonfor structured output,--fixto generate follow-up fixes automatically.
colter.testreturns structured JSON built for agent workflows, regression checks, and CI jobs.
What Test Does
colter test runs real shopping journeys across multiple model families and scores the outcomes. It answers a different question than Check: not just "is the protocol there?" but "does the agent succeed when it tries to use it?"
Personas
Ten personas run by default, with the_comparer available as an opt-in persona. Each persona makes live storefront or protocol requests, and the browser verification persona adds screenshots when the required tool surface is present.
| Persona group | Focus |
|---|---|
| Platform shoppers | Protocol flows, browser flow, mobile flow |
| Intent shoppers | Security, pricing clarity, data quality, returns, edge cases |
Scenarios
Typical scenarios include:
- discovery
- product info
- policy comprehension
- checkout readiness
- competitive comparison
- recommendation
- edge cases
Requirements
COLTER_API_KEY=col_live_...,colter auth login, or--api-key- Pro, Agency, or Enterprise plan
CLI
colter test <url> [flags]
Common Flags
| Flag | Purpose |
|---|---|
--models LIST | Choose claude, gpt, gemini |
--api-key KEY | Override COLTER_API_KEY for this run |
--personas LIST | Filter personas |
--scenarios LIST | Filter scenarios |
--json | Structured output |
--parallel N | Concurrent persona runs |
--timeout DURATION | Client wait timeout for hosted results |
--budget AMOUNT | Max spend in USD |
--threshold N | Exit non-zero below this score |
--fix | Generate fix plans for weak dimensions |
--fix-threshold N | Cutoff used with --fix |
--apply | Apply fixes after test when --fix is set |
--dry-run | Generate fix content without writing |
--pdf | Create a PDF report |
--pdf-out PATH | Set the PDF path |
--browser | Include browser_shopper when you already filter personas or scenarios |
--api-url URL | Override API base URL |
Examples
colter test https://store.example.com
colter test https://store.example.com --models claude,gemini --json
colter test https://store.example.com --threshold 70 --json
colter test https://store.example.com --fix --fix-threshold 75
colter test https://store.example.com --pdf --pdf-out report.pdf
Output Highlights
The JSON payload includes:
- overall score
- per-persona results
- per-scenario results
- per-model scores
- recommendations
- token and cost totals
CI
colter test exits with code 1 when the final score is below --threshold.
colter test https://mystore.com --threshold 70 --json
Recommended Flow
- Run Check.
- Run Test to see interaction failures.
- Run Fix on the weak areas.
- Re-run Test.
- Use Lens for live traffic after launch.
Pricing
Test is:
- available as a paid add-on on Pro
- included on Agency
- included on Enterprise