Extreme Availability Where It Really Matters: The Financial Power of Integration

Extreme availability may become a competitive differentiator. Extreme availability must ensure applications can respond in under a second, whether it’s supporting a wireless trade on an exchange floor, transferring multi-billion dollar transactions across global infrastructures, or protecting the integrity of integrated applications through a secure infrastructure that extends to third parties.

The financial industry requirements for availability, reliability, scalability, and performance are well beyond what most Commercial Off-the-Shelf (COTS) software provides. Because of this, many financial institutions have developed applications in-house. However, when considering the scale, scope, and cost of changes needed to select, support, and integrate several applications, organizations have discovered the value of leveraging outside middleware to speed future development and decrease development and maintenance costs. With the development of extreme availability using commercially available middleware, organizations no longer must choose between their availability requirements or the development and maintenance benefits of standard middleware; they can reap the benefits of both.

Extreme availability is the ultimate form of high availability, when the desired goal is no outages at all because the perceived harm to the business is immeasurably large. How does this differ from high availability? While it might seem initially that the difference is relatively small, achieving extreme availability requires changes throughout the organization. Indeed, the scope of the changes is sufficiently large that it makes sense to think of the needed changes as a culture change. One way to think of extreme availability is to think of it as an “availability culture” added to a standard high-availability operation.

Availability Culture

In even the most carefully designed fault-tolerant system, there’ll be unexpected events. While these will be rare with careful planning, proper preparation and response are vital if outages are completely unacceptable. Organizations pursuing extreme availability pay close attention to people, culture, fault tolerance, upgrade procedures, testing, security, using mature software, documentation, and simplicity.

Businesses that seek extreme availability have a strong culture of reliability that emphasizes taking extra precautions and being ready for surprises. Extreme availability requires fanatical attention to possible single points of failure. In power, for example, the building should have two power feeds, coming in on opposite sides of the building, from separate power substations. Each feed should support an independent electrical distribution network inside the building. The outside power should be backed up by batteries, which are backed up by generators with enough fuel storage to last many days. Good planning might even secure multiple fuel delivery contracts with independent providers who would be available a week or so before the fuel store empties.

For hardware and/or software upgrades, extreme availability organizations use a gradual rollout process that lets them observe the upgrade in actual operation. This gradual rollout is coupled with a way to back out the upgrade if problems occur. This may impose additional system redundancy requirements on the design.

Redundancy and fault tolerance can protect against random errors, but not against systemic errors such as a programming mistake or improper hardware configuration. The only remedy for such problems is testing, which extreme availability organizations take seriously. Testing periods are never shortened. Frequently, there are multiple independent testing organizations, and each can stop deployment. Testing is normally not under development. Frequently, testers are rewarded by the number of defects found, which is exactly opposite from developers.

Security precautions address both physical security (site, building and machine room) and network security. When possible, the system should be isolated from the Internet. If Internet access is a key feature, then layered protection against penetration attacks is needed, as is a mechanism for handling denial of service attacks. Data integrity needs to be checked after failure recovery in the application.

Extreme availability organizations typically implement a higher-than-normal degree of isolation. Development networks may be fully independent of production networks and testing environments are fully independent of development and production.

Software is markedly different from hardware in the shape of its failure

This entry was posted in Uncategorized. Bookmark the permalink.