Resilinets Failure Classes

From ResiliNetsWiki
Jump to: navigation, search

From the given service descriptions errors and failures of the service can be derived. Such a listing of failures will never be complete and should therefore show examples of failures. This is part of the ResiliNets architecture.



The definitions for service, error, fault and failure can be found in the Definition section.

SF1. Service Failures

A service instance fails if it does not provide any service to legitimate clients or returns erroneous results. The cause for not providing any service can be a service instance crash, a deadlock of the internal logic, or a blocking service access point to the communication subsystem. DoS attacks often cause such a behaviour. They utilise programming mistakes to exhaust resources or cause the service to change to an erroneous state. The return of false results is often caused by implementation mistakes.

SF1.1. QoS Errors

A service fails if it can not provide its service within the QoS parameters guaranteed to the client.

SF1.1.1. Performace Errors

  • Data arrives too late
  • Jitter is too high
  • Throughput is too small

SF1.1.2. Resilience Errors

We have seen multiple resilience failures in the past. The list is far from complete and gets extended during the ongoing work:

  • No backup system provided although resilience required
  • Redundant systems are not location disjoint: Natural disaster brings both systems down
  • Backup path is not node/link disjoint from primary path: Failure of one node/link can bring both paths down
  • Error propagation failure: BGP route flapping caused by join/leave messages sent to all BGP speakers
  • Bad failover strategy: SCTP retransmission to secondary IP address (backup path) degrades performance
  • Overlap of data segments: TCP overrides previously received data with newly received data, e.g. by retransmissions

SF1.2. Addressing Errors

  • Addressing failure: No such network, host, protocol, SAP
    • Unknown sender due to spoofed source address

SF2. Basic Building Block Failures

All hardware errors will result in a failure since we do not intend to build resilience mechanisms for these. This is caused by our level of abstraction.

SF2.1 Physical Link Failures

Cause Duration Countermeasure Example (optional)
Wired networks: cable cut permanent Redeploy cable
Wireless networks: peer moves out of range temporal find multi-hop path
Noisy channel destructs signal temporal add error control electro-magnetic interference

SF2.2 Node Hardware Failures

Cause Duration Countermeasure Example (optional)
Node destruction permanent Redeploy hardware Natural disaster
Defective Hardware permanent Redeploy hardware Aging
DoS attack permanent Reboot / Change hardware maleformated packets blocks interface
DoS attack temporal reboot power outage

SF2.3 Operating System Failure

Cause Duration Countermeasure Example (optional)
Implementation mistake temporal software update exception handling, resource management, ...

SF3. Communication specific Building Blocks

A communication system will inherit one or more of the following services as building blocks. For all service challenges, response mechanisms, result of the response, and examples are depicted.

SF3.1. Link Transport Service Errors

Challenge Response Result Example (optional)
Physical link failure service failure
Logical link failure service failure no link to peer
Logical link failure use redundant path normal operation forwarding service uses different multi-hop path
Node failure service failure peer is down
Transport association error service failure Connection reset attack
Transport association error re-estblishment of association normal operation Connection reset attack

SF3.1.1. Secure Link Transport Service Errors

Challenge Response Result Example (optional)
Anti-replay counter overrun disable association service failure
Anti-replay counter overrun re-keying normal operation
Data duplication drop data normal operation
Association timeout disable association service failure
Association timeout re-keying normal operation
Authentication failure disable association service failure
Truncation attack service failure refine implementation

SF3.1.2. E2E Transport Errors

Since this is only a specialised link transport service the same failures as for any other link transport service can occur.

SF3.1.3. Reliability Errors

Challenge Response Result Example (optional)
Packet loss ARQ mechanisms degraded service Stop-and-Wait, Go-back-N, Selective Repeat
Packet re-ordering reverse re-ordering normal operation
Packet duplication drop packet normal operation
Data alteration drop packet and ARQ degraded service
Data alteration enable FEC codes degraded service correct data after reception

SF3.1.4. Types of communication Errors

Challenge Response Result Example (optional)
Anycast: processing of multiple hosts none normal operation
Anycast: changing receiver none service failure
Mulitcast: incomplete data ay one host ARQ degraded service
Reliable Multicast: ACK storms concast normal operation

SF3.2. Forwarding Errors

Challenge Response Result Example (optional)
Link failure none service failure
Link failure redundant path or route normal operation
Node hard/software failure none service failure next hop or end system is down
Node hard/software failure redundant node normal operation next hop or end system is down
Unknown destination none service failure non self-learning routing
Unknown destination learn route normal operation self-learning routing
Degraded node service enable congestion avoidance service; use different path degraded service congestion, random packet drops
Attacks(?) blockhole router, wormhole router
Firewalling Service failure Wrong firewall configuration

SF3.3. Node Configuration Errors

Challenge Response Result Example (optional)
Node failure service failure Configuration server down
Link failure service fialure no link no external configuration
Link transport failure service failure

SF3.4. Security Association Negotiation Errors

Challenge Response Result Example (optional)
Downgrade attack abort negotiation service failure
Authentication failure abort negotiation service failure
Incompatible algorithms abort negotiation service failure

SF3.5. Access Control Errors

Challenge Response Result Example (optional)
Forwarding failure none service failure
E2E transport failure none service failure
Secure E2E transport failure suspend certain schemes degraded service scheme which do not rely on a secure E2E transport can still be used
No common scheme none service failure
Replay attack drop message normal service operation replay detection neccessary

SF3.5.1. Network Access Control Errors

Challenge Response Result Example (optional)
Link failure service failure
Node failure service failure Network access server down
Incompatible schemes abort negotioation normal service Incompatible authentication schemes, i.e. no WPA compatible Hardware

SF3.6. Certificate Online Verification Error

Challenge Response Result Example (optional)
Node error use trusted backup server normal operation primary trusted server is down
Link transport error none service failure

SF3.7. Name resolution service Errors

Challenge Response Result Example (optional)
(Secure) E2E transport failure none degraded service accepting unsecured responses can lead to vulnerability to cache poisoning or other attacks
Non existent name send error report normal operation
Server error use backup server normal operation find backup either by configuration or anycast

SF3.8 Feedback Services Errors

Challenge Response Result Example (optional)
Feedback from untrusted source
Feedback from hostile source drop information normal service

SF3.8 Monitoring Service Errors

Challenge Response Result Example (optional)

SF3.9 Congestion Avoidance Service Errors

Challenge Response Result Example (optional)

SF3.10. Routing Errors

Challenge Response Result Example (optional)
Unnoticed topology change re-run algorithm service failure topology change due to new link, node movement, etc
Unrecognoiced addtional node re-run algorthm service failure

SF3.10.1. Path Establishment Errors

Challenge Response Result Example (optional)

SF3.11. Transaction Service Error

  • Partial update of compartment policy due to connection failures, system resets, etc. leading to different policies on node within a compartment

SF3.12. Anonymity Service Errors

Challenge Response Result Example (optional)

Failure Semantics

We must identify the connection between failures and fault. A lower layer failure can be an fault for an upper layer and does not have to be a failure on the upper layer, too.

Personal tools