WHAT HAPPENED
On May 19, our Redis cache service ran out of memory. Redis is a critical component of OutSmart responsible for three things: managing user sessions (keeping you logged in), processing background jobs (queued tasks that run behind the scenes), and caching frequently used data (so the application stays fast and responsive).
When Redis ran out of memory and stopped accepting requests, all three of these functions failed simultaneously. This caused login failures, background processing to stop, and a significant spike in load on the underlying database, which in turn led to errors and degraded performance across the application.
TIMELINE (all times CEST)
15:45 - Redis memory began climbing abnormally.
15:49 - Redis reached full memory capacity and started rejecting requests. Application errors and login failures began.
16:14 - Redis memory management stabilised and the system recovered automatically. Service was restored at this point. Investigation continued to confirm the root cause.
16:34 - Incident reported on status page. Investigation ongoing.
17:12 - Root cause confirmed: Redis memory exhaustion.
17:28 - Root cause documented. Active monitoring started to ensure stability.
May 20, 06:30 - Extended overnight monitoring confirmed full stability. Incident resolved.
Total active impact window: approximately 25 minutes (15:49 to 16:14 CEST). Monitoring continued overnight to confirm stability.
IMPACT
All OutSmart services were affected: login, session management, background processing, and general application performance. Users experienced authentication failures and errors. The system recovered automatically once Redis memory management stabilised.
WHAT WE ARE DOING TO PREVENT RECURRENCE
We are conducting a full root cause analysis on why Redis memory filled unexpectedly. We are in active contact with our Managed Service Provider and AWS Enterprise Support, and are expanding our monitoring and alerting to catch memory pressure earlier before it causes an outage.