GPTS Logo
Giorgio's Paddle Tuning System

"I geeked hard and deep, so you don’t have to."

ELO Isn’t the Whole Story: Inside Pickleball IQ’s Hybrid Rating

Oct 23, 2025

You don’t need a court-side scandal to realize ratings can get weird. Chess solved it with ELO; pickleball added humans, partners, and vibes.

Why it matters

Ratings shape who you play, how you train, and whether a tournament draw is fun or lopsided. If a system whipsaws after one hot day or buries you for three bad points, you don’t trust it—and you stop improving against the right baselines.

ELO, DUPR, and the pickleball problem

ELO is elegant when one brain fights one brain. Pickleball is doubles, partner effects, wind, and Tuesday-night muscle memory. DUPR pushed ELO toward match results, but the side-effects are real:

  • Partners matter. Your number moves because your buddy had caffeine or didn’t.
  • Opponent uncertainty grows fast. New players make confidence bands swing.
  • Incentives get weird. When a single cutoff matters, people play to the line.

What hard cutoffs do to behavior

When there’s a hard cutoff, people game the cutoff. I see it all the time:

  • Choose partners or opponents that protect the number.
  • Avoid logging “risky” matches or sessions that might drop it.
  • Stop once the target is hit; don’t test again until it feels safe.
  • Sandbag just below a division to farm wins or medals.

That’s not evil—it’s human. The fix is to make progress continuous and resilient so the best move is simply… to get better.

I wanted something different for Pickleball IQ: a rating born from skills evidence first, then blended over time like a smart coach who remembers your last ten sessions—but doesn’t forget the grind it took to get there.

The hybrid: continuous score + EMA + gate-aware display

I test skills by “gates” (3.0, 3.5, 4.0, etc.). Gates unlock with a clean pass (≥ 80% on a curated question set). But passing shouldn’t snap your rating to a round number. Real players live between gates. So here’s the system I’m running under the hood.

1) Session score is continuous

  • Below pass (0–79%): map your percentage across the half-step below the gate. If you aimed at 3.5 and shot 60%, you land around 3.25.
  • At/above pass (80–100%): keep climbing a little above the gate (up to ~+0.20) to reward strong passes without pretending you’re a full tier up.

Result: you finish a session and see something like “3.62” instead of a hard snap.

Field Note: A 90% pass at 3.5 prints higher than an 80% at 4.0 with tiny sample size. Confidence matters as much as ambition.

2) Blend with history using EMA (tunable, not twitchy)

I use an exponential moving average. New info matters; reps matter more.

  • Base weight around 0.25, then damp it for small samples and long gaps.
  • More sessions = calmer updates. First ten sessions move fast; after that, it settles.
  • Optional time decay: if you ghost drills for 45 days, your next good day counts slightly more.

Think of it like upgrading a paddle: the first 10 grams change everything; the next 2 grams are refinement.

3) Gate-aware clamp (for sanity and fairness)

If you just unlocked 4.0, your visible rating shouldn’t nosedive to 3.6 because of one rough follow-up. The display uses a floor just below your highest passed gate and a sensible ceiling into the next tier. Strong passes raise the floor a touch; repeated fails lower it slowly. The math stays honest, but the UI avoids fake backslides.

AI Observation: Players don’t mind slow climbs. They hate yo-yos after wins.

4) Show uncertainty like an adult

Small samples lie. I attach a narrow band when you have reps and a wide band when you don’t. If you went 9/10 on a new gate, I’ll show something like “3.62 ± 0.08.” You can feel good without pretending it’s gospel.

piq hybrid rating diagram

caption: rating vs. time — session scores (points), EMA (smooth line), clamp floor/ceiling (shaded band)

What you’ll see in the app

  • Results: “This session: 9/10 (90%). PIQ estimate: 3.62 ± 0.08.”
  • Start: a compact chip with your current PIQ and the next unlock CTA. No math dump.
  • Unlocks: still binary. Pass the gate, move on. Rating never blocks eligibility.

Why not pure ELO (or pure gates)?

Pure ELO treats points like perfect witnesses. They’re not. Pure gates ignore momentum. Also not great. The hybrid keeps the skill signal (you vs. a clean set of scenarios) and layers time-weighted memory so progress looks like progress, not coin flips.

Practical upside

  • Trust: you can pass a gate without fearing a visual demotion tomorrow.
  • Coaching: decimals show where you are inside a tier—useful for targeting drills.
  • Tuning: parameters (weights, floors) are simple and auditable. No black box.

Pro Tip: Track your last 5–10 sessions. If your band is shrinking and the mean creeps up 0.03–0.05 per session, you’re on a healthy ramp.

Where pure ELO diverges (corner cases)

scenario pure ELO reaction hybrid reaction why it matters
Barely pass a new gate from a low prior Small bump; may still sit below the new tier if prior is low and opponent model says “expected” Clamps display just below the unlocked gate; continuous score shows you inside the tier You see progress without a fake demotion after a legit pass
Strong pass at a gate with low prior Big jump possible if model thinks it’s an upset; can overshoot Modest rise; session score tops out ~+0.20 over gate, then EMA blends Rewards quality without flinging you a full tier higher
Fail slightly above current level Drops on a narrow loss, especially vs. confident opponents Session score dips toward the lower half-step; EMA softens the hit Failing up doesn’t punish you like failing down
Small-sample pass (5–6 items) Large variance; one result can swing rating Visible band is wide; EMA weight is damped until sample grows The UI says early data instead of pretending it’s precise
Two fails right after unlocking Yo-yo risk; two losses can erase gains Display floor lowers slowly (hysteresis) after repeated fails Stability reduces fear of testing up
Long gap then one hot session Big jump if K-factor/recency is high Time weighting bumps α a bit, but EMA + clamp keep it sane Comebacks feel earned, not lucky

Data that shaped the defaults

In early tests, a 10-question gate with a base EMA weight of 0.25 and a clamp floor at gate−0.10 felt right. Strong passes (≥ 90%) nudged the floor to gate−0.05. After two consecutive fails at the new gate, the floor eased down by ~0.05 to avoid false certainty. That balance kept players confident without hiding reality.

Takeaway

Ratings should teach, not tease. Continuous scores show where you stand today; EMA remembers the work; the clamp respects your unlocks. Keep drilling. The number will follow the truth of your reps.