ResiliNets Axioms

From ResiliNetsWiki
Jump to: navigation, search

The ResiliNets Architecture is based on four [axioms]: IUER: inevitable, understand, expect, and respond. These four axioms motivate the ResiliNets Strategy and are supported by the ResiliNets Principles implemented by the ResiliNets Mechanisms.


Contents

A0. Inevitability of Faults

Faults are inevitable. It is not possible to construct perfect systems, nor is it possible to prevent challenges and threats.

It is not possible to construct fault-free systems, for two reasons:

  1. Internal faults are those that arise from within a given system due to imperfect designs, and while it is theoretically possible to use formal methods to design a provably correct system, this remains impractical for large complex systems and networks for the foreseeable future. In the cases of hardware, software, and network architecture: unknown, imprecise, and complex design requirements; design, implementation, deployment, operational, and maintenance mistakes; as well as insufficient testing all contribute to this problem.
  2. External faults are exercised by challenges from outside the system, and it is not possible nor practical to predict all such challenges (present and future) and design to defend against them. Threat and challenge models improve the ability to prevent external faults, but do not eliminate them.


A1. Understand Normal Operations

Understand the normal operating conditions, environment, and application demands. It is only by understanding normal operation that we have any hope of determining when the network is challenged or threatened.

We define normal operation to be the state of the network when there are no adverse conditions present. This loosely corresponds to the conditions for which the current Internet and PSTN are designed, when the network is not under attack, the vast majority of network infrastructure is operational, and connectivity is relatively strong. As an example, the PSTN is designed to handle normal time-of-day fluctuations of traffic, and even peak loads such as Mother’s day. These predictable application demands are within normal operation. On the other hand, flash crowds to an obscure Web site represent traffic that is beyond normal operation.

It is essential to understand normal operation to be able to detect when an adverse event or condition occurs.

A2. Expect Adverse Events and Conditions

Expect and be prepared for adverse events and conditions that disrupt normal operations with defence and detection. These challenges are inevitable.

We define an adverse event or ongoing condition as challenging the normal operation of the network:

  • unintentional mis-configuration or operational mistakes
  • large-scale natural disasters (e.g. hurricanes, earthquakes, ice storms, tsunami, floods)
  • malicious attacks from intelligent adversaries, including recreational crackers, industrial espionage, terrorism, and traditional or information warfare
    • against the network hardware, software, or protocol infrastructure
    • DoS and DDoS ((distributed) denial of service) attacks
  • environmental challenges
    • weak, asymmetric, and episodic connectivity of wireless channels
    • high-mobility of nodes and subnetworks
    • unpredictably long delay paths either due to length (e.g. satellite) or as a result of episodic connectivity
  • unusual but legitimate traffic load such as a flash crowd

We also classify adverse events and conditions by severity as mild, moderate, or severe, and categorise them into two types:

  1. Anticipated adverse events and conditions are ones that we can predict based either on past events (such as natural disasters), and attacks (e.g. viruses, worms, DDoS) or that a reasoned threat analysis would predict might occur.
  2. Unanticipated adverse events and conditions are those that we can’t predict with any specificity, but for which we can still be prepared in a general sense. For example, there will be new classes of attacks for which we should be prepared.

It is necessary to expect adverse events and conditions in order to design resilient networks, and thus this axiom motivates first two aspects of the D2R2 + DR ResiliNets Strategy during the real-time phase: defend and detect.


A3. Respond to Adverse Events and Conditions

Respond to adverse events and conditions by remediation ensuring correct operation and graceful degradation, restoration to normal operation, diagnosis of root cause faults, and refinement of future responses.

While it is necessary to expect adverse events and conditions, it is just as important to then take action.

This occurs in the latter parts of the D2R2 + DR ResiliNets Strategy:

  1. In the real-time phase: remediate and recover
  2. In the background phase: diagnose and refine



© 2006–2007 James P.G. Sterbenz and David Hutchison

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox