ResiliNets: Resilient and Survivable Networks – Overview
Society increasingly relies on computer networks in general, and the Internet in particular. Consumers rely on networks for access to information and services, personal finance, and for communication with others. The Internet has become indispensable to the routine operation of businesses and to the global economy. The military depends on network centric operations and warfare. Governments depend on networks for their daily operation, service delivery, and response to natural disaster and terrorist attacks.
Therefore, the consequences to disruption of the network are increasingly severe, and threaten the lives of individuals, the financial health of business, and the economic stability and security of nations and the world. With the increasing importance of the Internet, so follows it's attractiveness as a target from bad guys: recreational and professional crackers, terrorists, and from information warfare.
We therefore regard resilience and survivability as critical to the future of our network infrastructure. The ResiliNets initiative aims to understand and progress the state of resilience and survivability in computer networks, including the Global Internet, PSTN, SCADA networks, mobile ad-hoc networks, and sensor networks.
The wiki is designed to facilitate collaboration and provide the content for the ResiliNets portals at The University of Kansas (US) and Lancaster University (UK). Contact James P.G. Sterbenzjpgs@ittc.ku.edu firstname.lastname@example.org David Hutchisondh@comp.lancs.ac.ukfor further information
News, Meetings, and Events
- Next ResiliNets meeting Friday 13 Feb, 2015.
- Call for ResiliNets logos: If anyone has a bright idea for a ResiliNets logo, let us know.
- News Archives
On 2006.04.07 our group presented a number of posters at the ITTC IAB meeting.
The ResiliNets group meets weekly at KU with video links for European and other remote participants:
Friday mornings 09:00–11:00 CST|CDT = 15:00–17:00 GMT|BST* = 16:00–18:00 MEZ|MESZ*
in Nichols 246 (Executive Conference Room) +1 785 864 4557.
The meetings serve three main purposes:
- 09:00–10:00: journal club (reading group) in which a group member leads discussion of a paper
- discussion and brainstorming of new research and proposal ideas
- weekly status and discussion of individual research projects and papers in progress
ResiliNets meetings are open to all interested people; the best way to find out about and involved in our research is to come to our meetings. We are happy to introduce our work to new participants.
*except when the Americans are out of sync with the rest of the world's summer time, in which case an hour adjustment needs to be made to European times
Disciplines and Related Work
Disciplines Related to Faults and Challenges
- Fault Tolerance is the ability of a system to tolerate faults such that service failures do not result. Fault tolerance generally covers random single or at most a few faults, and is thus a subset of survivability, as well as of resilience.
- Survivability is the capability of a system to fulfil its mission, in a timely manner, in the presence of threats such as targetted attacks or large-scale natural disasters resulting in many failures, in addition to the few random failures covered by fault tolerance. Survivability is thus a superset of fault tolerance but a subset of resilience.
- Disruption Tolerance is the ability of a system to tolerate disruptions in connectivity among its components. Disruption tolerance is a superset of tolerance of the environmental challenges: weak and episodic channel connectivity, mobility, delay tolerance, as well as tolerance of power and energy constraints.
- Traffic Tolerance is the ability of a system to tolerate unpredictable offered load without a significant drop in carried load (including congestion collapse), as well as to isolate the effects from from cross traffic, other flows, and other nodes. The traffic can either be unexpected but legitimate such as from a flash crowd, or malicious such as a DDoS attack.
Trustworthiness Disciplines Related to Quantifiable Properties
- Dependability is the property of a system such that reliance can justifiably be placed on the service it delivers. It generally includes the measures of availability (ability to use a system or service) and reliability (continuous operation of a system or service), as well as integrity, maintainability, and safety.
- Security is the property of a system and measures taken such that it protects itself from unauthorised access or change, subject to policy. Security properties include AAA (auditability, authorisability, authenticity), confidentiality, and nonrepudiality. Security shares with dependability the properties of availability and integrity.
- Performability is the property of a system such that it delivers performance required by the service specification, as described by QoS (quality of service) measures.
Trustworthiness with respect to Challenges
- Robustness is a property that relates the operation of a control system to perturbations of its inputs. In the context of resilience, robustness describes the trustworthiness (quantifiable behaviour) of a system in the face of challenges.
- Resilience disciplines (listed above)
- Past failures and scenarios
- Mechanisms and Algorithms to Support Resilience
The ResiliNets architecture is based on a set of axiom, a resilience and survivability strategy, and a set of supporting principles.
Resilience Axioms: IUER
Inevitability of faults, Understand normal operations, Expect adverse events, Respond to adverse events and conditions
Resilience Strategy: D2R2 + DR
Real-time control loop: D2R2
- Defend against challenges and threats to normal operation
- passive defense
- active defense
- Detect when an adverse event or condition has occurred
- Remediate the effects of the adverse event or condition to minimise the impact
- Recover to original and normal operations
Background loop: DR
- Diagnose diagnose the fault that was the root cause
- Refine future behaviour
Prerequisites: service requirements; normal behaviour; threat and challenge models; metrics; heterogeneity in mechanism, trust, and policy
Tradeoffs: resource tradeoffs; complexity; state management
Enablers: security and self-protection; connectivity; redundancy; diversity; multilevel; context awareness; translucency
Behaviour: self organising and autonomic; adaptability; evolvability
Metrics and Modelling
A rigorous framework to quantify the network resilience on the basis of two orthogonal dimensions of communication networks: the physical network characteristics (operational space) and the service requirements (service space).
Operational space N: represents the physical state of the network
Resilient networks remain in normal operation in the face of challenges
- normal operation according to network design and engineering
- partially degraded but still operable
- severely degraded providing little or no operational capability
Service space P: represents the quality of service for an application over a given network
Resilient services remain acceptable even with network operation degrades
- acceptable service with respect to service specification
- impaired but usable service
- unacceptable service that provides little or no utility
Resilience R: as a function of state transition probability in two-dimensional state-space:
- each dimension consists of multi-variate metric descriptor
- network state S is discrete set of operational metrics and service parameters
- aggregation limits number of states
- each dimension divided into three regions
Topology and Challenge Modelling
Realistic topology generators are essential to the understanding of network design and survivability analysis. Two important issues that are not sufficiently addressed by current topology generators are node-positioning and cost considerations. The utility of the existing models could be vastly improved by incorporating these two features. This project aims at developing a new network topology generator, which enables node positioning and cost constraints on the topologies generated with several well-known graph generation models. Our approach incorporates network design practices in topology generation, thereby enabling a tool that can be used to generate viable alternate topologies during the network design and engineering phase. Further, we consider the representativeness of the generated topologies using several graphical properties such as degree distribution, shortest path distribution, link length distribution, and spectrum of the graph amongst several others.
An essential aspect of resilient network design is to understand how the networks behave under various challenges. To analyse network resiliency we model the challenges that disrupts the normal operation of network. In order to analyse full set of scenarios, simulation scripts require n networks for c challenges. Our model decouples the c×n input files required for complex simulation scripts, and reducing it to c+n input files, thus any challenge model can be applied to any network topology. This decoupling gains challenge scenario analysis efficiency.
- Resilience and Survivability for Future Networking (ResumeNet) is a collaboration among The University of Kansas (KU), Lancaster University,ETH Zürich, Techniche Universität München (TUM), Techniche Universiteit Delft, Université de Liège (ULg), Universität Passau, Uppsala Universitet (UU), NEC Laboratories Heidelberg, and France Telecom – Orange Labs. ResumeNet is investigating a framework, mechanisms, and experimental evaluation of resilience and survivability for future networking, and is funded by the EU Future Internet Research & Experimentation (FIRE) of the Seventh Framework Programme (FP7).
Resilient Network Protocols
ResTP: Resilient Composable Multipath Transport Protocol
GeoDivRP: Geodiverse Multipath Routing Protocol
Future Internet Architecture, Design, and Engineering
GpENI: Great Plains Environment for Network Innovation
- The Great Plains Environment for Network Innovation (GpENI) is a regional network between The University of Kansas (KU), Kansas State University (K-State), University of Nebraska – Lincoln (UNL), and University of Missouri – Kansas City within the Great Plains Network, supported with optical switches from Ciena interconnected by Qwest fiber infrastructure, in collaboration with the Kansas Research and Education Network (KanREN) and Missouri Research and Education Network. GpENI is funded in part by the National Science Foundation GENI (Global Environment for Network Innovation) Program as part of Cluster B in Spiral 1.
PoMo: PostModern Internetwork Architecture
- The PostModern Internet Architecture (PoMo) is a collaboration among The University of Kansas (KU), University of Kentucky (UK), University of Maryland (UMd), and Lancaster University to design a new internetworking architecture with explicit support for heterogeneity, policy, and trust boundaries amongst network realms. PoMo is funded in part by the National Science Foundation FIND (Future Internet Design) Program.
ANA: Autonomic Networking Architecture
- The Autonomic Networking Architecture (ANA) is a collaboration among ETH Zürich, Universität Basel, Lancaster University, The University of Kansas (KU), Université de Liège (ULg), National and Kapodistrian University of Athens (NKUA), Universitet i Oslo (UiO), Université Paris VI Pierre et Marie Curie (UPMC), NEC Laboratories Heidelberg, University of Waterloo, Telekom Austria, and Frauenhofer FOKUS. ANA is funded by the EU. Information Society Technologies (IST) – Future and Emerging Technologies (FET) of the Sixth Framework Programme (FP6).
Disruption- and Delay-Tolerant Communication and Domain-Specific Network Realms
Highly-Dynamic Airborne Ad Hoc Networking
- Highly-dynamic mobile-wireless networks present unique challenges to end-to-end communication, particularly caused by the time varying connectivity of high-velocity nodes combined with the unreliability of the wireless communication channel. We are developing a new domain-specific protocol suite for telemetry networks (TmNS) in the aeronautical test environment consisting of: AeroTP TCP-friendly transport protocol, AeroNP IP-compatible network protocol, and AeroRP location-assisted routing protocol. Our research explores the tradeoffs in the location of functionality such as error control and location management for high-velocity multihop airborne sensor networks and presents cross-layer optimizations between the MAC, link, network, and transport layers to enable a domain specific network architecture, which provides high reliability for telemetry applications. Sensor data is returned multihop from airborne test articles (TA) moving at speeds up to Mach 3.5 to the ground stations (GS) that track them with high-power directional antennaæ. This means that the contact time between TAs with closing velocities of Mach 7 may be as low as 10 seconds. Relay nodes (RN) improve multihop performance and location predictability. The telemetry network is connected to the Internet via gateways (GW).
Weather Disruption-Tolerant Mesh Networking
- There has been increased interest in the deployment of high-bandwidth point-to-point fixed wireless links as an alternative to fiber optic links due to cost or regulatory concerns. Applications include extending broadband Internet access, backhaul for 3G and proposed 4G deployments, and front-haul umbilical facilities for distributed antenna systems (DAS). Millimeter wave (70-90 GHz) wireless link technology is emerging for very short distances, but has the potential to span several miles and deliver data rates of 1–10 Gb/s. Unfortunately, these frequencies suffer significant attenuation due to atmospheric phenomena such as rain. This project is deploying test links, characterising their performance during real weather events such as thunderstorm, and applying novel routing techniques to a mesh network to mask impairments. We are exploring two new weather disruption-tolerant mesh routing protocols: PWARP predictive weather-assisted routing protocol and XL-OSPF cross-layered OSPF. In both cases, radar imagery is used to predict the trajectories of storm systems. PWARP uses this information to reroute in advance of a predicted disruption due to rain attenuation. XL-OSPF uses radar imagery to estimate the current attenuation on a given link to provide instantaneous reactivity based on cross-layering.
Latency-Aware Cross-Layered Information Access
Context Based Networking
Resilience to Flash Crowds and DDoS Attacks
Exploitation of embedded programmable network techniques to detect and remediate network anomalies, including flash crowds and DDoS (distributed denial of service) attacks.
Simulation models and protocols developed by the ResiliNets group.
*The University of Kansas – †Lancaster University – ‡Techniche Universität München – §ETH Zürich
¶Kansas State University – ‖University of Sydney – ° ISCTE Lisboa – ◊NEC Research Europe
Links, guidelines, and guidance for ResiliNets group members.
Writing Guides and Templates
Restricted pages for the ResiliNets community.
Keywords: resilient resilience survivable survivability dependable dependability reliable reliability available availability disruption delay fault tolerant tolerance DoS DDoS attack challenge error failure communication communications network networks networking
© 2006–2021 Justin P. Rohrer, James P.G. Sterbenz and David Hutchison with significant input from the ResiliNets group.