JSAC-2007-Paper

From ResiliNetsWiki
Jump to: navigation, search

This paper is targeting the JSAC Special issue on Delay and Disruption Tolerant Wireless Communication CFP

Contents

A Multilevel Resilience Architecture for Disruption Tolerant Networks

Authors

James P.G. Sterbenz, David Hutchison, Justin P. Rohrer, and Abdul Jabbar Mohammad

Information and Telecommunications Technology Center,

Department of Electrical Engineering and Computer Science,

University of Kansas,

Lawrence, KS 66045-7621, USA

E-mail: {jpgs,rohrej,jabbar}@ittc.ku.edu


Abstract

Disruption-tolerant wireless networks are required to provide acceptable service in the presence of various challenges. These include the environmental challenges of weakly-connected wireless channels, mobility, and unpredictable delay due to store-and-forward in episodically-connected disruption-tolerant networks. Furthermore, other disruptions that must be tolerated result from external events such as natural disasters, attacks, human error, and unexpected traffic such as flash crowds. We propose a systematic fundamental approach to analyze various challenges and faults that affect the performance of disruption tolerant networks, and present a strategy to combat such challenges. Based on this strategy we present a cross-layer multilevel architectural framework called ResiliNets that enables us to analyze, design, develop, and deploy resilient disruption-tolerant networks from the ground up. Furthermore we derive a set of resilience principles that guide network architecture and design.

We believe that an approach based on a rigorous understanding of resilience in all levels and components of the network is needed. Resilience is defined as the ability of the network to provide and maintain an acceptable level of service in the face of various challenges and disruptions to normal operation. The ResiliNets Architecture is based on the understanding that challenges to normal operation are inevitable and disruption-tolerant operation is an essential part of the emerging mobile wireless communication environment. We support resilient disruption-tolerant operation with a six-step two-phase strategy D^2R^2+DR: *defend*, *detect*, *remediate*, *recover*, *diagnose*, *refine*. Our resilience strategy aims to engineer the network to *defend* itself from adverse events and to *detect* the impact autonomously when defenses are not sufficient. In this case the network *remediates* such that services remain accessible whenever possible and degrade gracefully when necessary. As soon as the adverse condition has ended, the network will automatically and rapidly *recover* and return to normal operation. To improve network resilience and disruption-tolerance, the system must *diagnose* the root faults and learn from past events to *refine* its future defenses and operational behavior. This strategy is in turn supported by a set of principles which guide the ResiliNets Architecture. The resilience of each level provides a foundation for the resilience of the next level in three dimensions: protocol layers; data, control, and management planes, and network engineering from components through overall disruption-tolerant network architecture.

To evaluate the effectiveness of the proposed architecture as well as the resilience of existing disruption tolerant networks, we need a rigorous quantitative evaluation methodology. For this purpose we have developed a service-oriented framework to characterize the resilience, survivability, and tolerance to a number of disruptions at any abstraction level. We quantify the operational state and expected service of a network using functional metrics. Resilience is then formalized as transitions of the network state in a two-dimensional state space and is evaluated as the various network states supported by a given infrastructure.

Definitions

Introduction and Motivation

The Internet has become essential to the routine operation of businesses and to the global economy. The military depends on network centric operations and warfare. Governments depend on networks for their daily operation, service delivery, and response to natural disaster and terrorist attacks. Therefore, we regard resilience (and constituents survivability and dependability) as critical to the future of our network infrastructure. Resilience is the ability of the network to provide and maintain an acceptable level of service in the face of various challenges to normal operation. We propose a new strategy for resilience that aims to engineer the network to *defend* itself from these challenges and to *detect* the impact autonomously when defenses are not sufficient. In this case the network *remediates* such that services remain accessible whenever possible and degrade gracefully when necessary. As soon as the adverse condition has ended, the network will automatically and rapidly *recover* from degradation to normal operation. To improve network resilience, the system must *diagnose* the root faults and learn from the response to past challenges to *refine* its future defenses and operational behavior. Multilevel resilience needs to be considered in three dimensions: the protocol layer, the protocol plane, and the network architecture and engineering. The resilience of each level provides a foundation for the resilience of the next level. Each of the protocol layers are functionally composed with cross-layer optimizations. The data, control, and management planes must all implement resilient mechanisms. Lastly, from the network architecture and engineering dimension, fault-tolerant components are used to construct resilient topologies that are the basis for a global resilient internetwork. This paper presents a cross-layer multilevel architectural framework for the design, evaluation, and deployment of resilient networks for mission-critical applications.

Society increasingly relies on computer networks as essential for individuals, businesses, and governments. These networks include the Global Internet, PSTN (public switched telephone network – wired and mobile), SCADA networks (supervisory control and data acquisition), and emerging sensor and mobile ad hoc networks. They have developed into large scale systems with increasing complexity both in terms of physical infrastructure as well as the operational protocols and user applications. Essential services are provided by distributed networked systems in the sectors of energy, finance, banking, education, health care, defense, transport and communication.

Challenges to communications:

In certain environments, such as defense, the challenges are more pronounced

Background and Related Disciplines

There are a number of relevant disciplines that serve as the basis of network resilience in general, and the ResiliNets strategy in particular.

Fault Tolerance

Fault tolerance began with the idea of triple-modular redundancy [2] and emerged as a discipline that allowed the construction of reliable systems consisting of unreliable components [3], [4], [5]. Fault tolerance generally assumes random independent faults of components, but is unable to adequately deal with the large scale failures that arise from a natural disaster or an attack from an intelligent adversary. Fault tolerance is thus a necessary, but insufficient condition for network resilience.

Dependability

Dependability is the discipline that quantifies the reliance that can be placed on the service delivered by a system, and consists of two major aspects [6], [7]: Availability is readiness for usage, that is the probability that a system or service will be operable when needed. Reliability is continuity of service, that is the probability that a system or service remains operable for a specified period of time. These definitions notions of dependable systems have been codified by IFIP WG 10.4 [8] and ANSI T1S1 [9] and are commonly applied to network dependability. These notions of reliability are also applied to fiber-optic links [10], [11], [12] (although the term survivability is commonly used, which is inconsistent with the following use of the term).

Survivability

Survivability extends these notions to be the capability of a system to operate dependably in the presence of treats that include attacks from an intelligent adversary as well as natural scale disasters [13], [14], [15], [16].

Disruption Tolerant Networking

Disruption Tolerance deals with sustaining communications when connections are periodic, intermittent, or prone to other interference. This includes long and/or unpredictable delay under the name Delay-Tolerant Networks (DTNs). DTN architectures are a solution to the problems presented by challenged networks. These networks are referred to as challenged because they do not conform to the set of assumptions (E2E Path, limited RTT, low packet drop rate) under which standard internet protocols operate. This is accomplished using a technique called bundling which operates in a store-and-forward manner. Delay-Tolerant Networking is a superset of Interplanetary Networking, and is a subset of Disruption Tolerant Networking.

Resilience

The concept of Resilience necessarily includes all of these research areas.

Challenges

Failures

Metrics

ResiliNets Strategy

Axioms

Strategy

Principles

Summary

References

Division of work for survey paper

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox