ai_scams6 de junio de 2026Edición #25

How to write tests that don't break on real data

Tests often fail in production because they assume ideal data distributions. Here's how to make them robust:

The problem: most tests use random numbers generated by np.random — like mean([1, 2, 3, 4, 5]) = 3. But real-world data has outliers — mean([1, 2, 3, 1000, 5]) = 211.6, which breaks the test. This is a first-class issue in production systems — tests pass in development but fail on real data.

The solution: use different distributions — normal, uniform, skewed, etc. For mean, test with normal distribution (good case), uniform distribution (edge case), and skewed distribution (worst case). This makes tests robust to different data patterns. Example: test mean([1, 2, 3, 1000, 5]) with different distributions, not just random numbers.

Why this matters for us: fragile tests are the bane of our existence — they pass in development but fail on real data. Robust tests save us from production bugs and customer complaints.

“Tests should fail on the real world, not on the random number generator.”

testing.googleblog.com

Lee el originalAbrir en pestaña nueva

#testing#robustness#pandas#data_distributions

← Volver a la edición

How to write tests that don't break on real data

Recibe el diario en tu puerta