Service Level Availability (SLA) is the percentage of time during which the platform is in an available state. Other states are degraded and outage.
Each of the user facing services have two Service Level Indicators (SLI): the Uptime , and the Error rate. The Uptime score is generally a measure of the service performance (latency). The Error rate measures the percentage of requests that fail due to an error (usually, a 5XX status code).
A service is considered available when:
- The Uptime of the service is above its Service Level Objective (SLO),
- AND The error rate is below its Service Level Objective (SLO).
An example of available web service; within a 5 minute period:
- At least 90% of requests have a latency within their "satisfactory" threshold
- AND, less than 0.5% of requests return a 5XX error status response.
In other words, a service needs to simultaneously meet both of its SLO targets in order to be considered available. If either target is not met, the service is considered unavailable.
The availability score for a service is then calculated as the percentage of time that it is available. The Availability score for each service combined define the platform Service Level Availability (SLA). The SLA number indicates availability of Billit.eu for a select period of time.
For example, if service becomes unavailable for a 10 minute period, the availability score will be:
- 99.90% for the week (10 070 minutes of availability out of 10 080 minutes in a week)
- 99.97% for the month (43 190 minutes of availability out of 43 200 minutes in the month)
The SLA score can be seen on the SLA dashboard, and the SLA target is set as an Infrastructure key performance indicator for the API (api.billit.be), Login (my.billit.be) and Application (my.billit.be)
Billit guarantees 99.9% uptime, however our internal target is 100%