Investigating Unexpected Behavior in Our Application

Incident Report for OutSmart

Postmortem

WHAT HAPPENED

On May 19, our Redis cache service ran out of memory. Redis is a critical component of OutSmart responsible for three things: managing user sessions (keeping you logged in), processing background jobs (queued tasks that run behind the scenes), and caching frequently used data (so the application stays fast and responsive).

When Redis ran out of memory and stopped accepting requests, all three of these functions failed simultaneously. This caused login failures, background processing to stop, and a significant spike in load on the underlying database, which in turn led to errors and degraded performance across the application.

TIMELINE (all times CEST)

15:45 - Redis memory began climbing abnormally.
15:49 - Redis reached full memory capacity and started rejecting requests. Application errors and login failures began.
16:14 - Redis memory management stabilised and the system recovered automatically. Service was restored at this point. Investigation continued to confirm the root cause.
16:34 - Incident reported on status page. Investigation ongoing.
17:12 - Root cause confirmed: Redis memory exhaustion.
17:28 - Root cause documented. Active monitoring started to ensure stability.
May 20, 06:30 - Extended overnight monitoring confirmed full stability. Incident resolved.

Total active impact window: approximately 25 minutes (15:49 to 16:14 CEST). Monitoring continued overnight to confirm stability.

IMPACT

All OutSmart services were affected: login, session management, background processing, and general application performance. Users experienced authentication failures and errors. The system recovered automatically once Redis memory management stabilised.

WHAT WE ARE DOING TO PREVENT RECURRENCE

We are conducting a full root cause analysis on why Redis memory filled unexpectedly. We are in active contact with our Managed Service Provider and AWS Enterprise Support, and are expanding our monitoring and alerting to catch memory pressure earlier before it causes an outage.

Posted May 22, 2026 - 10:06 CEST

Resolved

We are pleased to announce that the issues affecting our application have been fully resolved. After a thorough monitoring phase, we have confirmed the stability and performance of the application. We appreciate your patience and support during this time and apologize for any inconvenience caused. Our team is dedicated to continuously improving our services to prevent future incidents.

Posted May 20, 2026 - 06:30 CEST

Update

We have identified an issue with our integrations and are actively monitoring the situation. Our team is dedicated to resolving this issue promptly.

Posted May 19, 2026 - 17:35 CEST

Monitoring

We have identified an issue with our integrations and are actively monitoring the situation. Our team is dedicated to resolving this issue promptly.

Posted May 19, 2026 - 17:28 CEST

Identified

Our investigation has led to the identification of the cause behind the recent issues within our application. With the root cause now identified, our team is developing a solution to rectify the problem. We are working diligently to implement these fixes and will keep you informed on our progress. Thank you for your continued patience.

Posted May 19, 2026 - 17:12 CEST

Investigating

We are currently investigating reports of unexpected behavior within our application. Our technical team is actively working to identify the cause of these issues to address them as quickly as possible. We are committed to maintaining the highest level of service quality and will provide updates as our investigation progresses. We appreciate your patience and understanding.

Posted May 19, 2026 - 16:34 CEST

This incident affected: Integrations & Open API (Integration Page & Settings, Open API), Application (Web-Application, Login & Authentication, Customer Portal, E-Mail Delivery, SMS Service, Location Services, Routing, File Storage), and Mobile Applications (Android Mobile App, iOS Mobile App, Mobile API).