We have reduced the availability requirements of all component parts by three orders of magnitude. As six nines allows only 32 seconds of downtime, having a single reboot a year could prove problematic.įigure 2: Parallel Transport Availabilityĭespite the user’s experience being identical, the difference between the two figures above is huge. When these ten elements are connected without any redundancy, each of these elements must be up and available 99.9999% (or six nines) of the time for the end-user to perceive five nines of availability. Depending on the application and network topology, this can be a very stringent standard.Ĭonsider Figure 1 below which shows serially connected routers, switches, access points, servers, and transited clouds. This permits only 5.26 minutes of downtime a year. What five nines means is that the end-user perceives that their application is available 99.999% of the time. The best-known metric of network availability is known as “five nines”. Such a quantitative basis is essential as availability with acceptable performance is ultimately how a network is judged. It is possible to do this as there is a direct relationship between the probability of failure and the end user’s perception of network availability.
One way or another, your end users will adapt to what you are providing. What weaknesses have become exposed based on the shift to Telework? What needs upgrading considering the shift in application mix and resulting performance requirements? If you are an Enterprise Operator, now is the perfect time to examine your design assumptions against the new reality of your network. And some of these failures might result in application impacts which could have been avoided. For a brief time at least, even casual users are recognizing and appreciating the network’s robustness and availability. This is despite redistributed traffic loads, and an explosive growth in interactive, high-bandwidth applications. So far, the Internet has stood up to our new reality amazingly well. SUSE Linux Enterprise Server can help maximize uptime by providing server clustering, by exploiting hardware RAS features, and by enabling live kernel patching without rebooting.Right now, I am sitting at home thinking about how the world is being held together by the Internet. If a server fails, other servers in the cluster can take over the functions and workloads of the failed server. A server cluster is a group of linked servers that work together to improve system performance, load balancing and service availability. Server clustering is another uptime strategy that delivers high availability of IT services and workloads. Redundant servers for backup and failover help maintain data center uptime in case of server failure. Hardware reliability, availability and serviceability (RAS) are important factors in data center uptime. Uptime is often used as a sign of operating system or network reliability, representing the length of time a system can be left unattended without crashing or needing maintenance. For example, a computer system that has been running for three weeks has a “three-week uptime.” High availability uses uptime to define an agreed level of operational performance measured against a 100 percent operational standard. IT professionals may use uptime to refer to a total consecutive amount of operational time. A service level agreement (SLA) or other real-time service contract may include uptime/downtime ratios that show how much time a service is expected to remain operational. The terms uptime and downtime are used to define the level of success provided by real-time services.
Downtime, the opposite of uptime, is the period of time when a system is not operational. Uptime is often measured in percentiles, such as “five 9s,” meaning a system that is operational 99.999 percent of the time. Uptime can also be a metric that represents the percentage of time that hardware, a computer network, or a device is successfully operational. Uptime is a computer industry term for the time during which a computer or IT system is operational.