Skip to main content

Eval & Observability

Real telemetry from every model call.

FrictionLens scores app reviews across five sentiment dimensions using Gemini. This page shows the eval intuition layer behind that pipeline — Spearman correlation per dimension against a hand-labeled golden set, plus real cost, latency, and reliability metrics from production traffic.

Calls (30d)
249
Total model invocations
Spend (30d)
$0.22
Sum of input + output cost
p95 latency
8252ms
Weighted by call volume
Error rate
71.1%
0 errors · 177 rate-limited

Cost by model

Total USD across last 30 days

  • gemini-2.5-flash$0.2206
  • gemini-2.5-flash-lite$0.0041

Latency by prompt

p50 / p95 / p99 across last 30 days

  • review-v1243 calls
    p50
    6469ms
    p95
    7500ms
    p99
    8296ms
  • batch-review-v14 calls
    p50
    43277ms
    p95
    43277ms
    p99
    43277ms
  • report-v21 calls
    p50
    26114ms
    p95
    26114ms
    p99
    26114ms
  • report-v31 calls
    p50
    33042ms
    p95
    33042ms
    p99
    33042ms

Latest eval run

Spearman correlation per dimension vs. hand-labeled golden set

Prompt
review-v1
Model
gemini-2.5-flash
n
19 reviews · golden vv1
Date
2026-05-14
DimensionSpearman ρMAE (0–10 scale)Verdict
love0.9460.63strong
frustration0.9510.95strong
loyalty0.9710.68strong
momentum0.9120.89strong
wom0.9870.37strong
Churn-risk exact-match: 68%run 1ed3bd0e

Prompt registry

Deployed prompts with version, call volume, and last-deployed date

Prompt IDNameVersionCalls (30d)DeployedNotes
review-v1reviewv12432026-05-13Initial scoring rubric for single-review analysis.
batch-review-v1batch_reviewv142026-05-13Same rubric as review-v1, applied across a batch of reviews.
report-v1reportv102026-05-13Initial aggregate Vibe Report synthesis prompt.
report-v2reportv212026-05-13Adds verdict, the_one_thing, citations, confidence, wishlist/dealbreaker bucketing, vagueness rejection, de-duplication.
report-v3reportv312026-05-14Adds strict review-ID binding: cited_review_ids must use only the rNNN IDs supplied, never invented. Enables the click-to-see-receipts UI.

Recent traces

Last 20 model calls (anonymized — no user identifiers)

WhenPromptModelTokens (in / out)LatencyCostSourceStatus
14d agobatch-review-v1gemini-2.5-flash2966 / 1174349714ms$0.0302pipelinesuccess
22d agobatch-review-v1gemini-2.5-flash2075 / 707930175ms$0.0183pipelinesuccess
28d agoreview-v1gemini-2.5-flash / 6528mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6672mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6568mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6535mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6564mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6564mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6572mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6721mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6512mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6559mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6547mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6572mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6530mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6523mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6513mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6574mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6543mseval_runrate_limit
28d agoreview-v1gemini-2.5-flash / 6584mseval_runrate_limit