These definitions provide the foundation for the ResiliNets Architecture. For each term, there is a set of definitons from the literature, with citations from the literature. A notation of [ResiliNets] indicates that a particular definition is the one used for ResiliNets, which may either be chosen the literature, or synthesised as part of the ResiliNets Architecture.
Note: this page is currently under construction and extreme flux. Definitions marked [ResiliNets] are still subject to change!
Resilient networks aim to provide acceptable service to applications:
- ability for users and applications to access information when needed, e.g.:
- Web browsing
- distributed database access
- sensor monitoring
- situational awareness
- maintenance of end-to-end communication association, e.g.:
- computer-supported cooperative work
- video conference
- teleconference (including VoIP calls)
- operation of distributed processing and networked storage, e.g.:
- ability for distributed processes to communicate with one another
- ability for processes to read and write networked storage
Note that resilience is a superset of
“Resilience is the persistence of service delivery that can justifiably be trusted, when facing changes.“ “Resilience is the persistence of Dependability when facing changes.“
“Resilience is the persistence of performability when facing changes.“
Broad range of randomly occurring and potentially damaging events such as natural disasters. [...] Accidents [are] externally generated events.”
"The property that ensures that the actions of an entity may be traced uniquely to the entity" [X.800]
Adverse Event or Condition
A challenge that is manifested as an event or ongoing condition that disrupts the normal operation of the network. Adverse events and conditions are classified by severity as mild, moderate, or severe. We categorise adverse conditions into two types:
- Anticipated adverse events and conditions are ones that we can predict based either on past events (such as natural disasters, and attacks (e.g. viruses, worms, DDoS) or that a reasoned threat analysis would predict might occur. (e.g. terrorist attack and information warfare).
- Unanticipated adverse events and conditions are those that we can’t predict with any specificity, but for which we can still be prepared in a general sense. For example, there will be new classes of attacks for which we should be prepared.
“Event, defined as an exceptional condition occurring in the operation of hardware or software of a managed network”
[ Steinder-Sethi-2004 ]
“Intentional execution of a threat by an intelligent adversary.”
“Potentially damaging events orchestrated by an intelligent adversary. Attacks include intrusions, probes, and denial of service. Moreover the threat of an attack may have as severe an impact on a system as an actual occurance. A system that assumes a defensive position because of the threat of an attack may reduce its functionality and divert additional resources to monitoring the environment and protecting system assets.”
DoS and DDoS Attack
“A denial-of-service attack is characterized by an explicit attempt by attackers to prevent legitimate users of a service from using that service. A distributed denial-of-service attack deploys
multiple machines to attain this goal. The service is denied by sending a stream of packets to a victim that either consumes some key resource, thus rendering it unavailable to legitimate clients, or provides the attacker with unlimited access to the victim machine so he can inflict arbitrary damage.”
[ Mirkovic-Reiher-2004 ]
Data origin authentication: “The corroboration that the source of data received is as claimed”
Peer-entity authentication: “The corroboration that a peer entity in an association is the one claimed”
“Security measure designed to establish the validity of a transmission, message, or originator, or a means of verifying an individual's authorization to receive specific categories of information”
“Process of verifying a claim that a system entity or system resource has a certain attribute value“
Authenticity: “Property of being genuine and able to be verified and be trusted”
Authorisation / Authorization
“The granting of rights, which includes the granting of access based on access rights.”
“Access privileges granted to a user, program, or process”
“1a. An approval that is granted to a system entity to access a system resource”
“1b. A process for granting approval to a system entity to access a system resource”
“Probability of the system being found in the operating state at some time t in the future given that the system started in the operating state at time t=0. Failures and down states but maintenance and repair actions always return the system to an operating state.”
“The proportion of the operating time in which an entity meets its in-service functional and performance requirements in its intended environment.”
“1. The degree to which a system, subsystem, or equipment is operable and in a committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time. Note 1: The conditions determining operability and committability must be specified. Note 2: Expressed mathematically, availability is 1 minus the unavailability.”
“2. The ratio of (a) the total time a functional unit is capable of being used during a given interval to (b) the length of the interval. Note 1: An example of availability is 100/168 if the unit is capable of being used for 100 hours in a week. Note 2: Typical availability objectives are specified in decimal fractions, such as 0.9998.”
“3. Timely, reliable access to data and information services for authorized users.”
“The property of a system or a system resource being accessible, or usable or operational upon demand, by an authorized system entity, according to performance specifications for the system; i.e., a system is available if it provides services according to the system design whenever users request them”
“A type of behavior that [... consists of] sending conflicting information to different parts of the system.”
[Lamport-Shostak-Pease-1982] Note: the use of the term failure in this paper is not consistant with [Avizienis-Laprie-Randell-Landwehr-2004TR]
- unintentional mis-configuration or operational mistakes
- large-scale disasters
- natural disasters (e.g. hurricanes, earthquakes, ice storms, tsunami, floods)
- human-made disasters (e.g. power failures)
- malicious attacks from intelligent adversaries, including recreational crackers, industrial espionage, terrorism, and traditional or information warfare
- against the network hardware, software, or protocol infrastructure
- DoS and DDoS ((distributed) denial of service) attacks
- environmental challenges (mobility, weak channels, unpredictably long delay)
- unusual but legitimate traffic load such as a flash crowd
- dependent failures
- service failure at a lower level
- cascading failures
- interdependent infrastructure
- social, political, economical, and business factors
A challenge that is inherent in the communication environmental scenario due to:
- weak, asymmetric, and episodic connectivity of wireless channels
- high-mobility of nodes and subnetworks
- unpredictably long delay paths either due to length (e.g. satellite) or as a result of episodic connectivity
“Dependability is that property of a computer system such that reliance can justifiably be placed on the service it delivers. The service delivered by a system is its behavior as it is perceived by its user(s); a user is another system (physical, human) which interacts with the former. Depending on the application(s) intended for the system, different emphasis may be put on different facets of dependability, i.e. dependability may be viewed according to different, but complementary, properties, which enable the attributes of dependability to be defined:
- the readiness for usage leads to availability;
- the continuity of service leads to reliability;
- the non-occurrence of catastrophic consequences on the environment leads to safety;
- the non-occurrence of unauthorized disclosure of information leads to confidentiality;
- the non-occurrence of improper alterations of information leads to integrity;
- the aptitude to undergo repairs and evolutions leads to maintainability.”
“Dependability is an integrating concept that encompasses the following attributes: availability: readiness for correct service; reliability: continuity of correct service; safety: absence of catastrophic consequences on the user(s) and the environment; integrity: absence of improper system alterations; maintainability: ability to undergo modifications, and repairs.”
The ability of a system to tolerate disruptions in connectivity among its components. Disruption tolerance is a superset of tolerance of the environmental challenges: weak and episodic channel connectivity, mobility, delay tolerance, as well as due to power and energy constraints
“Any natural catastrophe (including any hurricane, tornado, storm, high water, wind-driven water, tidal wave, tsunami, earthquake, volcanic eruption, landslide, mudslide, snowstorm, or drought) or, regardless of cause, any fire, flood, or explosion in any part of the United States that, in the determination of the President, causes damage of sufficient severity and magnitude to warrant major disaster assistance under the Stafford Act to supplement the efforts and available resources of states, local governments, and disaster relief organizations in alleviating the damage, loss, hardship, or suffering caused thereby. ”
“Stochastic events in either space (i.e., equipment) or time”
- Error detection: the action of identifying that a system state is erroneous.
- Error detection and recovery: Form of error processing where error recovery takes place after error detection.
- Error diagnosis: assessment of the damages caused by a detected error, or by errors propagated before detection
- Error processing: the actions taken in order to eliminate errors from a system. Fault prevention and fault tolerance.
- Error recovery: form of error processing where an error-free state is substituted for an erroneous state.”
“The part of the total state of the system that may lead to its subsequent service failure. It is important to note that many errors do not reach the system’s external state and cause a failure.”
[Avizienis-Laprie-Randell-Landwehr-2004TR] “Error is defined as a discrepancy between a computed, observed, or measured value or condition and a true, specified, or theoretically correct value or condition. Error is a consequence of a fault ... Errors may cause deviation of a delivered service from the specified service, which is visible to the outside world. The term failure is used to denote this type of an error.”
[ Steinder-Sethi-2004 ]
“1. The difference between a computed, estimated, or measured value and the true, specified, or theoretically correct value.
2. A deviation from a correct value caused by a malfunction in a system or a functional unit. Note: An example of an error is the occurrence of a wrong bit caused by an equipment malfunction.”
“An erroneous transition of a system is an internal state transition to which a subsequent failure could be attributed. Specifically, there must exist a possible sequence of interactions which would, in the absence of corrective action from the system, lead to a system failure attributable to the erroneous transision.”
[ Lee-Anderson-1990 ]
“An erroneous state of a system is an internal state which could lead to a failure by a sequence of valid transactions. Specifically, there must exist a possible sequence of interactions which would, in the absence of corrective action taken by the system and in the absence of erroneous transitions, lead from the erroneous state to a system failure.”
[ Lee-Anderson-1990 ]
- Fail-controled system: systems which are designed and realized in order that they may only fail – or may fail to an acceptable extent – according to restrictive modes of failure.
- Fail-safe system: system whose failures can only be, or are to an acceptable extent, benign failures.
- Fail-halt system: system whose failures can only be, or are to an acceptable extent, halting failures.
- Fail-passive system: system whose failures can only be, or are to an acceptable extent, frozen output failures.
- Fail-silent system: system whose failures can only be, or are to an acceptable extent, silence failures.”
“Potentially damaging events caused by deficiencies in the system or in an external element on which the system depends. Failures may be due to software design errors, hardware degredation, human errors, or corrupted data.”
“The occurrance of an event in which an entity does not meet its in-service functional and performance requirements or expectations.”
“Event that occurs when the deliverd service deviates from correct service. A service fails either because it does not comply with the functional specification, or because this specification did not adequately describe the system function. A service failure is a transition from correct service to incorrect service, i.e., to not implementing the system function.”
MTBF (Mean Time Between Failures)
Expected value of the time between failures, including the time to repair. MTBF = MTTF + MTTR
MTTF (Mean Time to Failure)
Expected value of the failure density function.
MTTR (Mean Time to Repair)
Expected value of the repair time density function.
“Adjudged or hypothesized cause of an error. Error cause which is intended to be avoided or tolerated. Consequence for a system of the failure of another system which has interacted or is interacting with the considered system.
- Fault avoidance: methods and techniques aimed at producing a fault-free system. Fault prevention and fault removal.
- Fault-based testing: testing aimed at revealing specific classes of faults.
- Fault diagnosis: the action of determining the cause of an error in location and nature.
- Fault-finding testing: testing whose purpose is revealing faults.
- Fault forecasting: estimating the present number, the future incidence, and the consequences of faults.
- Fault masking: the result of applying error compensation systematically, even in the absence of error.
- Fault passivation: the actions taken in order that a fault cannot be activated.
- Fault prevention: preventing fault occurrence or introduction
- Fault removal: reduction of the presence (number, seriousness) of faults.
- Fault tolerance: provision of a service up to fulfilling the system function in spite of faults.
- Fault treatment: the actions taken in order to prevent a fault from being reactivated.”
They can be classified according to five main viewpoints which are their phenomenological cause, their nature, their phase of creation or of occurrence, their situation with respect to the system boundaries, and their persistence.
- The phenomenological causes leads one to distinguish [Avizienis 78]:
- physical faults, which are due to adverse physical phenomena
- human-made faults, which result from human imperfections
- The nature of faults leads one to distinguish:
- accidental faults, which appear or are created fortuitously
- intentional faults, which are created deliberately, with or without a malicious intention
- The phase of creation with respect to the system’s life leads one to distinguish:
- development faults, which result from imperfections arising either a) during the development of the system (from requirement specification to implementation) or during subsequent modifications, or b) during the establishment of the procedures for operating or maintaining the system
- operational faults, which appear during the system’s exploitation
- The system boundaries leads one to distinguish:
- internal faults, which are those parts of the state of a system which, when invoked by the computation activity, will produce an error
- external faults, which result from interference or from interaction with its physical (electromagnetic perturbations, radiation, temperature, vibration, etc.) or human environment
- The temporal persistence leads one to distinguish:
- permanent faults, whose presence is not related to pointwise conditions whether they be internal (computation activity) or external (environment)
- temporary faults, whose presence is related to such conditions, and are thus present for a limited amount of time”
“The adjudged or hypothesized cause of an error. [...] Faults can be internal or external to system. The prior presence of a vulnerability, i.e., an internal fault that enables an external fault to harm the system, is necessary for an external fault to cause an error, and possibly subsequent failure(s).”
[Avizienis-Laprie-Randell-Landwehr-2004TR] “Faults (also referred to as problems or root causes) constitute a class of network events that can cause other events but are not themselves caused by other events ... Faults may or may not cause one or more errors.”
[ Steinder-Sethi-2004 ]
“1. An accidental condition that causes a functional unit to fail to perform its required function.
2. A defect that causes a reproducible or catastrophic malfunction. Note: A malfunction is considered reproducible if it occurs consistently under the same circumstances.
3. In power systems, an unintentional short-circuit, or partial short-circuit, between energized conductors or between an energized conductor and ground.”
The ability of a system to tolerate faults such that service failures do not result. Fault tolerance generally covers random single or at most a few faults, and is thus a subset of survivability.
“The ability of a functional entity to mask or mitigate the impact of faults on its specified operation.”
“Consequences of the system behavior are well understood and predictable.”
“[In INFOSEC, the] quality of an information system (IS) reflecting the logical correctness and reliability of the operating system; the logical completeness of the hardware and software implementing the protection mechanisms; and the consistency of the data structures and occurrence of the stored data. Note that, in a formal security mode, integrity is interpreted more narrowly to mean protection against unauthorized modification or destruction of information. [INFOSEC-99]”
“Quality of an IS reflecting the logical correctness and reliability of the operating system; the logical
completeness of the hardware and software implementing the protection mechanisms; and the consistency of the data structures and occurrence of the stored data. Note that, in a formal security mode, integrity is interpreted more narrowly to mean protection against unauthorized modification or destruction of information.”
Data integrity: “Condition existing when data is unchanged from its source and has not been accidentally or maliciously modified, altered, or destroyed”
System integrity: “Attribute of an IS when it performs its intended function in an unimpaired manner, free from deliberate or inadvertent unauthorized manipulation of the system”
Data integrity:“ Property that data has not been changed, destroyed, or lost in an unauthorized or accidental manner”
“Correctness integrity: Property that the information represented by data is accurate and consistent”
Source integrity: “Property that data is trustworthy (i.e., worthy of reliance or trust), based on the trustworthiness of its sources and the trustworthiness of any procedures used for handling data in the system”
“Dependability with respect to the aptitude to undergo repairs and evolutions. Measure of continuous incorrect service delivery (corrective maintenance only). Measure of the time to restoration from the last experienced failure (corrective maintenance only).”
“The ability of an entity to facilitate its diagnosis and repair.”
“1. A characteristic of design and installation, expressed as the probability that an item will be retained in or restored to a specified condition within a given period of time, when the maintenance is performed in accordance with prescribed procedures and resources.
2. The ease with which maintenance of a functional unit can be performed in accordance with prescribed requirements.”
“A set of very high-level (i.e. abstract) requirements or goals. Missions are not limited to military settings since any successful organization or project must have a vision of its objectives whether expressed implicitly or as a formal mission statement. Judgements as to whether or not a mission has been successfully fulfilled are typically made in teh context of external conditions that may affect the achievement of that mission.”
Mobility refers to the movement of nodes or groups of nodes in the network relative to one-another such such that the topology (physical or logical) or connectivty is impacted [ResiliNets]
“A globally unique, persistent identifier used for recognition, for access to characteristics of the resource or for access to the resource itself”
“[Inability of] denial by one of the entities involved in a communication of having participated in all or part of the communication”
“Assurance the sender of data is provided with proof of delivery and the recipient is provided with proof
of the sender's identity, so neither can later deny having processed the data”
“Protection against false denial of involvement in an association (especially a communication association that transfers data)”
The state of the network when there are no adverse conditions present. This loosely corresponds to the conditions for which the current Internet and PSTN are designed, when the network is not under attack, the vast majority of network infrastructure is operational, and connectivity is relatively strong.
"Performance refers to how effectively and efficiently a system delivers a specified service, presuming it is delivered correctly.” [Meyers-1995]
Quality of Service (QoS)
“The absence of errors.
- manufacture (yield): equipment which has a high probability of being operable immediately after it has been manufactured
- operation (accuracy): information given by the equipment has a high probability of being correct
- failure (lifetime): equipment that will remain operable for a long time”
phrase extraction from [Pierce-1965]
“Probability of a device (or system) performaing its purpose adequately for the period of time intended under the operating conditions intended.”
[Radio-Electronics-Television Manufactures Association, 1955] [O'Conner-1991], [Billinton-Allan-1992], [Grover-2004]
“The probability that an entity will complete its intended mission as required over a specified period of time in its intended environment.”
“1. The ability of an item to perform a required function under stated conditions for a specified period of time.
2. The probability that a functional unit will perform its required function for a specified interval under stated conditions.
3. The continuous availability of communication services to the general public, and emergency response activities in particular, during normal operating conditions and under emergency circumstances with minimal disruption.”
“Systems are resourceful if they are able to determine whether they have achieved their goals or, if not, to develop and carry out alternate plans,” i.e. they are redundant in their end results.
“Robustness focuses on the ability of a system to maintain specified features when subject to assemblages of perturbations either internal or external.”
“Robust control refers to the modeling, analysis and design of control systems with “uncertainties”, i.e., control systems for which only inexact models are available.”
“Dependability with respect to the non occurrence of catastrophic failures. Measure of continuous delivery of either correct service or incorrect service after benign failure. Measure of the time to catastrophic failure.”
“Safety is the probability that a system does not fail in a manner that causes catastrophic damage during a specified period of time. Since system safety depends on the effect of a system failure rather than on the cause of the failure, one can easily imagine quantifying system safety in the context of cyber attack. While the safety of data from accidental erasure has certainly been a consideration in safety analysis of information systems, when we add security considerations, we will also need to consider the security of sensitive data in the event of a security breach as a component of safety. For example, the exposure of very sensitive data (e.g., private keys) might enable an attacker to cause catastrophic damage.”
“1a. A system condition that results from the establishment and maintenance of measures to protect the system. 1b. A system condition in which system resources are free from unauthorized access and from unauthorized or accidental change, destruction, or loss. 2. Measures taken to protect a system.”
“A condition that results from the establishment and maintenance of protective measures that enable an enterprise to perform its mission or critical functions despite risks posed by threats to its use of information systems. Protective measures may involve a combination of deterrence, avoidance, prevention, detection, recovery, and correction that should form part of the enterprise’s risk management approach.”
“System behavior as perceived by the system user.”
“The capability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents.”
“The ability of an entity to continue to meet its functional requirements during network events such as cyber-attacks, physical attacks, natural disasters, and traffic overloads.”
“A property of a system, subsystem, equipment, process, or procedure that provides a defined degree of assurance that the named entity will continue to function during and after a natural or man-made disturbance; e.g., nuclear burst. Note: For a given application, survivability must be qualified by specifying the range of conditions over which the entity will survive, the minimum acceptable level or post-disturbance functionality, and the maximum acceptable outage duration.”
“Survivability is the system’s ability to continuously deliver services in compliance with the given requirements in the presence of failures and other undesired events.“
“Networks that can survive an enemy attack or natural disaster.“
“Assurance that a system will perform as expected.”
“Trustworthiness is assurance that a system deserves to be trusted – that it will perform as expected despite environmental disruptions, human and operator error, hostile attacs, and design and implementation errors. Trustworthy systems reinforce the belief that they will continue to produce expected behaviour and will not be suspectible to subversion.”
“Internal fault that enables an external fault to harm the system, [it] is necessary for an external fault to cause an error, and possible subsequent failure(s).”