ResiliNets Strategy

From ResiliNetsWiki
Jump to: navigation, search

The ResiliNets Architecture is based on a six-step two-phase strategy D2R2+DR: defend, detect, remediate, recover, diagnose, refine, which support the four ResiliNets Axioms IUER: inevitable, understand, expect, and respond. The strategy is supported by the ResiliNets Principles that are implemented by the ResiliNets Mechanisms.


Contents

Phase 1: Real-Time Control Loop – D2R2

The first phase consists of a cycle of four steps that are performed in real time and are directly involved in network operation and service provision. Many of these cycles operate simultaneously, triggered whenever an adverse event or condition is detected.


S1. Defend

The first step in the resilience strategy is to defend against challenges and threats to normal operation. The goals are to

A threat analysis is necessary to mount a defence/defense.

Examples of defences:

  • erasure coding over spatially redundant diverse paths, which permits data transfer to continue even when one of the paths is disrupted
  • secure signalling protocols with necessary authentication and encryption to resist traffic analysis and prevent the injection of bogus signalling messages


S2. Detect

The second step is to detect when an adverse event or condition has occurred. Detection is used to determine when defences

  • need to be strengthened
  • have failed and remediation needs to occur

S3. Remediate

The third step is to remediate the effects of the adverse event or condition to minimise the impact. The goal is to do the best possible at all levels after an adverse event and during an adverse condition. Corrective action must be taken at all levels to minimise the impact of service failure, including correct operation with graceful degradation of performance.


S4. Recover

The fourth step is to recover to original and normal operations, including control and management of the network.

Once an adverse event has ended or an adverse condition is removed, the network should recover from its remediation state to allow any degraded services return to normal performance and operation.

Examples:


Phase 2: Background Diagnosis and Refinement – DR

The second phase consists of two background operations that observe and modify the behaviour of the D2R2 cycle: diagnosis of faults and refinement of future behaviour.


S5. Diagnose

While it is not possible to directly detect faults, a system may be able to detect resultant errors within itself, or failures may be detected outside the system. It may be possible to diagnose the fault that was the root cause. This may result in an improved system design, and may affect recovery to a better state.


S6. Refine

The final aspect of the strategy is to refine behaviour for the future based on past D2R2 cycles. The goal is to learn and reflect on how the system has defended, detected, remediated, and recovered so that all of these can be improved to continuously increase the resilience of the network.

Representation

Castle_Analogy

Related Work

Several other research efforts have proposed strategies for various apsects of survivability, dependability, fault tolerance.

ANSA

  • fault confinement (defense)
  • fault detection (error/failure detection)
  • fault diagnosis (diagnosis)
  • reconfiguration (remediation)
  • recovery (remediate)
  • restart (remediate)
  • repair (recovery)
  • reintegration (recovery)

DENISE

CMU SEI

[ Ellison-Fisher-Linger-Lipson-Longstaff-Mead-1999 ]


© 2006–2007 James P.G. Sterbenz and David Hutchison

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox