Incident Description

Users of EMS could not logon to EMS.  They were presented with the login screen, which took them to the IDP selection page (as per normal).  After successful authentication on the IDP, they where redirected to EMS. However, instead of being logged in on EMS, they where logged out.


The reason for degradation:


The impact of this service degradation was:


Incident severity:  Partial service degradation

Data loss: 

Total duration of incident: 15 hours


Timeline

All times are in UTC

DateTimeDescription

 

21:55:53 

First error in indico.log of redis being unavailable:

ConnectionError: Error -2 connecting to master.production-events-redis.service.ha.geant.net:6379. Name or service not known.

 

10:42First user query about EMS login problem (Slack #general)

 

12:06Service degradation incident email sent out to product owner (Steffie Bosman)

 

13:01


 



 



 



 



 



 

13:01Service restored email sent out to product owner (Steffie Bosman)



Proposed Solution

Additional monitoring (Sensu checks) will be added