Eval & Observability
Real telemetry from every model call.
FrictionLens scores app reviews across five sentiment dimensions using Gemini. This page shows the eval intuition layer behind that pipeline — Spearman correlation per dimension against a hand-labeled golden set, plus real cost, latency, and reliability metrics from production traffic.
Calls (30d)
249
Total model invocations
Spend (30d)
$0.22
Sum of input + output cost
p95 latency
8252ms
Weighted by call volume
Error rate
71.1%
0 errors · 177 rate-limited
Cost by model
Total USD across last 30 days
- gemini-2.5-flash$0.2206
- gemini-2.5-flash-lite$0.0041
Latency by prompt
p50 / p95 / p99 across last 30 days
- review-v1243 callsp506469msp957500msp998296ms
- batch-review-v14 callsp5043277msp9543277msp9943277ms
- report-v21 callsp5026114msp9526114msp9926114ms
- report-v31 callsp5033042msp9533042msp9933042ms
Latest eval run
Spearman correlation per dimension vs. hand-labeled golden set
- Prompt
- review-v1
- Model
- gemini-2.5-flash
- n
- 19 reviews · golden vv1
- Date
- 2026-05-14
| Dimension | Spearman ρ | MAE (0–10 scale) | Verdict |
|---|---|---|---|
| love | 0.946 | 0.63 | strong |
| frustration | 0.951 | 0.95 | strong |
| loyalty | 0.971 | 0.68 | strong |
| momentum | 0.912 | 0.89 | strong |
| wom | 0.987 | 0.37 | strong |
Prompt registry
Deployed prompts with version, call volume, and last-deployed date
| Prompt ID | Name | Version | Calls (30d) | Deployed | Notes |
|---|---|---|---|---|---|
| review-v1 | review | v1 | 243 | 2026-05-13 | Initial scoring rubric for single-review analysis. |
| batch-review-v1 | batch_review | v1 | 4 | 2026-05-13 | Same rubric as review-v1, applied across a batch of reviews. |
| report-v1 | report | v1 | 0 | 2026-05-13 | Initial aggregate Vibe Report synthesis prompt. |
| report-v2 | report | v2 | 1 | 2026-05-13 | Adds verdict, the_one_thing, citations, confidence, wishlist/dealbreaker bucketing, vagueness rejection, de-duplication. |
| report-v3 | report | v3 | 1 | 2026-05-14 | Adds strict review-ID binding: cited_review_ids must use only the rNNN IDs supplied, never invented. Enables the click-to-see-receipts UI. |
Recent traces
Last 20 model calls (anonymized — no user identifiers)
| When | Prompt | Model | Tokens (in / out) | Latency | Cost | Source | Status |
|---|---|---|---|---|---|---|---|
| 14d ago | batch-review-v1 | gemini-2.5-flash | 2966 / 11743 | 49714ms | $0.0302 | pipeline | success |
| 22d ago | batch-review-v1 | gemini-2.5-flash | 2075 / 7079 | 30175ms | $0.0183 | pipeline | success |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6528ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6672ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6568ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6535ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6564ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6564ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6572ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6721ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6512ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6559ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6547ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6572ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6530ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6523ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6513ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6574ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6543ms | — | eval_run | rate_limit |
| 28d ago | review-v1 | gemini-2.5-flash | — / — | 6584ms | — | eval_run | rate_limit |