WHAT HAPPENED
On May 21, OutSmart experienced a second outage caused by a different but related Redis failure. Rather than running out of memory (as in Incident #1 on May 19), the Redis cluster itself became unstable and began dropping connections. The effect was the same: user sessions, background job processing, and caching all stopped working simultaneously, causing login failures, degraded performance, and a spike in database load across the application.
We recognise this is the second Redis-related outage in three days and take that seriously.
TIMELINE (all times CEST)
14:55 - Redis cluster began experiencing connectivity failures.
15:31 - Incident detected and reported on status page. Investigation started.
15:33 - Root cause identified: Redis cluster instability causing connection failures.
15:40 - Redis cluster stabilised. Application services began recovering.
15:51 - Recovery confirmed. Infrastructure capacity expansion initiated as a structural improvement.
17:19 - Capacity upgrade completed and verified. Incident formally resolved. Monitoring continued.
Total active impact window: approximately 45 minutes (14:55 to 15:40 CEST). Service fully confirmed stable at 17:19 CEST.
IMPACT
Login, session management, and background processing were affected across all OutSmart services. Database load spiked significantly during the impact window before normalising once Redis recovered.
WHAT WE ARE DOING TO PREVENT RECURRENCE