RCA P2 Incident 2026-02-25
Incident Summary
Product: CERRIX Platform
Date: 2026-02-25
Reference ID: RCA-20260225-P2-ControlsWorkspace
Incident Timeline
2026-02-25 09:30
Performance issues detected in the Controls workspace via internal monitoring
2026-02-25 09:52
First customer report received, intake completed within 10 minutes
2026-02-25 10:37
First communication sent to affected customers
2026-02-25 17:27
Second communication sent with investigation update and remediation plan
2026-02-26 14:34
Third communication sent confirming the fix and planned deployment schedule
2026-02-27 02:00–05:00
Fix successfully deployed to all production environments during scheduled maintenance window
2026-02-27 09:18
Final communication sent confirming resolution and restored stability
Root Cause
Root Cause
The incident was caused by a performance regression introduced in the latest release, which affected the Controls module. A change to an underlying software component altered how certain database queries were generated, resulting in significantly slower query execution for customers with larger datasets. This caused the Controls workspace to time out and fail to load for a limited number of affected environments.
Detection Gaps
Internal monitoring detected the issue shortly after the release was deployed. Detection and initial investigation were swift once the first symptoms appeared.
Reason the Issue Was Not Identified Before Release
The release was tested both functionally and on performance prior to deployment. However, the performance testing environment did not fully reflect the data volumes present in all customer environments, which meant the specific conditions triggering the issue were not reproduced during pre-release validation.
Impact Analysis
Only select customer environments were affected. The Controls workspace was temporarily unavailable for these customers, presenting as slow loading or failure to display data. No data loss or security issues occurred. All other modules and environments remained fully operational.
The deployment of the fix required a planned maintenance window on 27 February between 2:00 and 6:00 AM CET, during which up to 30 minutes of downtime was expected.
Resolution & Recovery Steps
Short-term fixes implemented
The development team identified and resolved the inefficient query responsible for the timeouts. A targeted fix was developed, tested, and validated before being deployed to all affected production environments during a controlled maintenance window.
Confirmation of stability
After deployment, the Controls module was validated against production data to confirm that performance had returned to expected levels. Monitoring tools were used to verify query execution times and error rates. Additionally, confirmation was received from affected customers that the Controls workspace was loading correctly.
Lessons Learned
What worked well
Internal monitoring enabled early detection of the issue. Cross-team collaboration between Development and Product (Support) was effective, and customers were proactively informed throughout the process with regular updates.
What didn't work well
Communication around the deployment timeline could have been more consistent. The initially communicated schedule was adjusted, and this change was not explicitly acknowledged in subsequent customer updates.
Preventive & Corrective Actions
To prevent similar incidents in the future and further strengthen our service reliability, we are taking the following actions:
Investigate adding performance and query validation checks to our automated testing pipeline
2026-06-01
Formalize our incident response process, including communication guidelines and templates
2026-06-01
Ensure the performance testing environment is representative of customer environments in terms of data volume and complexity
2027-01-01
We remain fully committed to continuously improving our processes and minimizing any potential impact on our customers. Thank you for your understanding and trust.
If you have any questions about this Root Cause Analysis, please reach out to us: Getting Support
Last updated