๐Ÿ”’
Retena Quality
Sign in with the Retena admin account
โšก

Retena Quality

BETA Loadingโ€ฆ
Loadingโ€ฆ
๐Ÿ† Overall Model Ranking composite = BLEU 35% + quality 40% + speed 25%
Loadingโ€ฆ
๐Ÿ“‹ Recent Samples
Loadingโ€ฆ
Source lang: Accepted/corrected eval decides quality. Shadow drift measures disagreement, not truth.
Loadingโ€ฆ
Loadingโ€ฆ
Loadingโ€ฆ
โœ… Accepted Eval Accuracy against verified/corrected transcript pairs
Loadingโ€ฆ
๐Ÿงช Shadow Drift provider output vs current primary transcript
Loadingโ€ฆ
๐Ÿท๏ธ Live Routing Badges recent voice rows
Loadingโ€ฆ
๐Ÿ” Recent Shadow Samples redacted comparison cards
Loadingโ€ฆ
Target lang: Source lang:
๐Ÿ“ˆ Translation Quality by Model higher = better
Loadingโ€ฆ
View by: Which model is best per language?
Loadingโ€ฆ
Model A: vs Lang pair:
Select two models to compare
Language: Target: Show:
๐Ÿ” Samples โ€” click a row to expand translations
Loadingโ€ฆ
Loadingโ€ฆ
๐Ÿ” Cleanup Pass Breakdown why was the 2nd pass skipped or run?
Loadingโ€ฆ
๐Ÿ“ By Language avg length, bullets, improvement per language
Loadingโ€ฆ
Recent Entries
Loadingโ€ฆ
Status: Language:
Loadingโ€ฆ
๐ŸŽฏ Production Transcription Quality live message quality_score rows
Scores here come from Retena production message heuristics. Use ASR Models โ†’ Accepted Eval for provider accuracy against verified/corrected transcript pairs.
Loadingโ€ฆ
๐Ÿ“ How Production Quality Scoring Works

Each production transcription is scored 0โ€“100 using fast heuristics (no LLM). The score reflects how likely the transcript is accurate and natural.

โœ… Positive Signals
โ€ข +10 Word count โ‰ฅ 5
โ€ข +10 Avg word length 3โ€“10 chars
โ€ข +5 Has punctuation
โ€ข +5 No looping patterns
โ€ข +10 Realistic speech rate (0.3โ€“3.0 sec/word)
โŒ Negative Signals
โ€ข โˆ’20 Word count < 3
โ€ข โˆ’15 Avg word length < 2 or > 15
โ€ข โˆ’20 Repeated substrings (hallucination)
โ€ข โˆ’30 Contains [BLANK_AUDIO] / [INAUDIBLE]
โ€ข โˆ’15 Impossibly fast (< 0.1 sec/word)
โ€ข โˆ’10 Impossibly slow (> 10 sec/word)
80โ€“100 Excellent 60โ€“79 Good 40โ€“59 Review 0โ€“39 Poor

Base score: 60 ยท Clamped 0โ€“100 ยท Pure heuristics, no LLM cost

๐Ÿ“Š Production Quality by Language
Loadingโ€ฆ
๐Ÿ“‹ Scored Production Messages
Loadingโ€ฆ