ai_scamsJune 6, 2026Issue #25

How to write tests that don't break on real data

Tests often fail in production because they assume ideal data distributions. Here's how to make them robust:

The problem: most tests use random numbers generated by np.random — like mean([1, 2, 3, 4, 5]) = 3. But real-world data has outliers — mean([1, 2, 3, 1000, 5]) = 211.6, which breaks the test. This is a first-class issue in production systems — tests pass in development but fail on real data.

The solution: use different distributions — normal, uniform, skewed, etc. For mean, test with normal distribution (good case), uniform distribution (edge case), and skewed distribution (worst case). This makes tests robust to different data patterns. Example: test mean([1, 2, 3, 1000, 5]) with different distributions, not just random numbers.

Why this matters for us: fragile tests are the bane of our existence — they pass in development but fail on real data. Robust tests save us from production bugs and customer complaints.

“Tests should fail on the real world, not on the random number generator.”

testing.googleblog.com

Read the originalOpen in new tab

#testing#robustness#pandas#data_distributions

← Back to the issue

How to write tests that don't break on real data

Get the daily on your stoop