← Homepage Orderbox Status Page

Intermittent issue while accessing SuperSite

Opened on December 8th, 2019 8:30 am GMT, last updated December 10th, 2019 7:41 am GMT

Resolved

Root Cause Analysis:


At around 11:09 AM IST/05:39 AM GMT, we received a few down alerts from our Monitoring System. But these alerts were recovering every few seconds and initially, we thought it was a network-related issue from those particular locations.


We noticed that all Supersites were intermittently throwing either one of the following errors:


1. 503 Service Unavailable

2. SQLSTATE[HY000] [2002] Connection refused

3. SQLSTATE[HY000] [2006] MySQL server has gone away

4. SQLSTATE[HY000] [2002] No route to host


Our System Administration Team immediately began troubleshooting and noticed that all our Supersite container health checks are failing and the application is not able to establish a steady connection with the Database Servers. The System Administration Team immediately looped in our Database Administrators and further troubleshooting began.


The Database Administration Team noticed that ProxySQL was restarting every few seconds and as a result database connections were being dropped intermittently. On checking further the team noticed the following error in the logs:


http://prntscr.com/q88tyu


This error was caused due to the following bug:


https://github.com/sysown/proxysql/issues/2131


The Database Administration Team immediately updated the ProxySQL version from 2.0.4 to 2.0.6 on one of the proxies and monitored the Server post the update. Database connections were stable and the Server was not restarting randomly. Once this fix worked, it was applied to the other Proxy as well and everything stabilized. All Supersites began resolving seamlessly once again.


While moving to a clustered setup in September 2019, ProxySQL 2.0.6 was newly released and still had a few bugs because of which we had not updated our Servers. These bugs were recently fixed.


Root Cause: Bug in ProxySQL 2.0.4.


Action Items: Proactively update Server Software to the latest stable versions available in the market.

Posted December 10th, 2019 7:41 am GMT

Monitoring

There was a temporary network related issue with our Supersite Servers which caused few Supersites to not resolve. Our System Admin team has looked into this issue after which the connectivity was restored.


We are monitoring our servers to ensure there are no further such issues and shall post a detailed Root Cause Analysis soon.

Posted December 8th, 2019 9:36 am GMT

Investigating

We are currently encountering an issue with our SuperSite servers, due to which you may receive intermittent error messages such as "503 Service Unavailable" while accessing the Supersite.


Our System Admin team is already working on this, and efforts are ongoing to resolve the issue at the earliest. We will update this thread as soon as we have any further information.

Posted December 8th, 2019 8:30 am GMT

Affected Services

  • OrderBox