Illustrative composite — a consumer habit-tracking app

The Tuesday Problem

Method: Replace

A push-notification program moving from manual sends to an automated engine that chose which reminder each user received on a weekly cadence.

Notification engagement was capped below its potential — and a rising share of users were muting notifications entirely. The new engine was pinging the whole user base on the same weekday, every week.

The mistaken framing

The team read it as a routing problem: a growing library of reminder types and no systematic way to decide who should get which, and when. The fix was a configurable rules engine — the team defines each reminder's content and eligibility rules, and the engine evaluates eligibility on a weekly cadence and sends the best one. The assumption underneath: if a user is eligible for at least one reminder, send them one.

Structural diagnosis

The always-send assumption was never examined for the resolution at which it held. Aggregate tests showed more reminders produced more logged sessions across the user base — true at the population level. But a reminder program is an individual-level intervention, and a finding that holds across the whole base doesn't hold for every user within it: for some, one more push is the nudge that makes them mute notifications entirely — the opposite of the intended effect. So "always send" was evidence-based at the population level and wrong for a meaningful minority at the individual level. The question that surfaces it: at what resolution does this finding actually apply?

What I called out

Before the build was finished, the resolution mismatch was flagged: aggregate evidence that more reminders helped across the base didn't mean every individual user was better served by always receiving one — and for a system built to treat users differently, the assumption deserved examination at the individual level before being locked into the engine. The team agreed it was a real future consideration; for the first release, the aggregate evidence justified the MVP assumption. Because the flag was raised, the selection logic was deliberately structured in clean, independent steps — first determine eligibility, then select the best reminder — so a future "decide whether to send at all" step could be added without restructuring the engine.

The intervention

When the fix became warranted, it was a targeted addition, not a rebuild: the engine was updated to decide whether a push was warranted before selecting which one. Because the logic was already split into independent steps and the no-send decision had been anticipated, the new step slotted in cleanly — and it unlocked spreading sends across the week and tuning each user's send time to when they were most likely to engage rather than mute.

What it illustrates

A finding that holds across a whole population doesn't automatically hold for each individual within it. When a system is built to treat people differently, population-level evidence is a starting point, not a sufficient answer — and "at what resolution does this apply?" has to be asked before the build, not after. The other half: a system whose concepts are clean and whose steps are independent can absorb a consequence no one predicted with a targeted addition rather than a rebuild.

Because the engine's logic was split into clean, independent steps, the new send-decision slotted in without a rebuild — and later refinements (per-user timing, frequency, channel) attach to the same structure instead of forcing another rewrite

Start a structural conversation Working together full-time