Talk:ResiliNets Principles
Contents |
Axioms
JPGS: I'm puttng this comment here since we don't have (and probably don't need) a seperate axiom discussion. We had UER, but added A0: Faults are inevitable. But this is not ordered the same way as U, E, and R, and thus FEUR seems a bit odd. Thus, I propose "I: Inevitability of Faults" so that IEUR is a bit more consistant.
Comments?
Abdul: While trying not to start another debate on axioms...I have a comment. In the Expect part I see that we have a statement that says ".....challenges are inevitable...". Why not then combine the A0 with A2 and just say that faults and challenges are both inevitable. In fact, this might make sense, because faults and challenges are strongly related to each other. So, expect adverse events that may disrupt normal operations by exploiting faults in the system.
If we want to keep these two separate for clarity of expression, then I am OK with IEUR. On second thought, I am inclined to keep them as two separate axioms...just to retain the simplicity. Sorry for playing devil's advocate.
JPGS: that's the whole point of the discussion! I think what Marcus convinced us of is that A0 says that it is impossible to have a perfect (fault-free) design, and given that, we must expect the challenges that impact us (A2). So I'm still convinced that the seperation is good, but will rename as I suggested, since there doesn't seem to be any objection, and it still seems the right thing to me after mulling it over for a day.
Principles: Critical Issues
Heterogeneity, Trust, and Policy
JPGS: At one time we had heterogeneity on one of the lists but it didn't make the wiki, and is a glaring omission now that I've again stepped back to look at the whole picture. I think there are two possibilities:
1. Add it, as I have done temporarily as P5A so as not to upset the current numbering unless needed
2. Combine it with P7 Heterogeneity includes mechanism, trust, and policy. This has some appeal in that this becomes the realm or compartment principle that says we need to provide resilience even though there are a variety of realms each with their own mechanism, trust, and policy.
I'm leaning toward doing this, but would like opinions and think about it over night.
Daniel: I do not find the second part of this principle (P5A) to be very clear i.e. the part following the semicolon. Based on how P7 is written, I think that the Heterogeneity principle ought to remain separate.
JPGS: It was a very quick hack.
Justin P. Rohrer 15:54, 20 December 2006 (CST): Heterogeneity is a consideration at every layer of the network, not just trust and policy, so I would vote against combining with P7. It definitely has some relationship to complexity, but I think it is distinct enough from that to remain separate.
JPGS: I don't think I expressed the idea very clearly. I was able to discuss this at length with Marcus and Abdul and we decided to comine them. Hopefully the text is a bit better now, although it still needs work. Does this make more sense?
Abdul: I agree with the new P7 principle. However, it is still missing heterogeneity in resources. Some of the most common cases such as inability of a protocol to perform well when the two nodes in communication are in different networks (such as wired/wireless) are not covered by the current text.
Behavioral Constraints
JPGS: David came up with the idea that we need to have constraints on behavior. This is related to a slightly larger issue that has to do with protocol and algorithm correctness. The point is that in a resilient system, we need to be able to specify and measure correct behavior. This is certainly needed to understand normal operations (Axiom 1). While this idea is related to the metrics axiom, I'm afraid it isn't currently captured. So again we have at least two choices:
1. Add it as a distinct principle
2. Beef up the metrics principle to include it
I'm leaning toward #1. I hope David will weigh in on this one with an opinion.
Justin P. Rohrer 16:03, 20 December 2006 (CST): I think the metrics principle can be made to capture what is needed relative to behavior. As I see it the principle is that we need to be able to quantify the network's behavior and changes in it's behavior. It's up to the application or service description to prescribe what "correct" behavior looks like in terms of our metrics. Normal operations again would be defined in terms of metrics, so I don't think that behavioral constraints warrants a separate principle.
JPGS: After discussing this with David, Abdul, and Marcus, we've decided that a seperate principle is indeed needed. Justin: see if this makes sense now. I've inserted it as P1' to not disrupt the numbering. We should revisit the best order for these after the holidays.
Security
JPGS: We've talked a few times about the role of conventional security measures (confidentiality, integrity, authentication, nonrepudiation, access control). I think these are *all* covered in the P11. Self-Protection. So perhaps it should be renamed "Self-Protection and Security", have the additional security aspects explicitly mentioned, and then we've nicely covered the space (which we owe for ANA). Comments?
Daniel: Security is covered by P11 as well as Trust in P7. One can only trust another realm if one can authenticate that realm. With that said, perhaps it would be good to have the additional security aspects explicitly mentioned and then rename the principle as Self-Protection and Security.
Justin P. Rohrer 16:11, 20 December 2006 (CST): I agree with adding security to the name. While trust relies on security mechanisms it does not really describe security, so I think it is important to maintain that distinction. I have been using the term security somewhat more broadly to include the concept of self-protection against DOS and other types of abuse that do not specifically fall under confidentiality, integrity, authentication, nonrepudiation, or access control. Is it appropriate to include that concept in P11 as well?
JPGS: this was the easy one; we all (including Abdul, Marcus, and David in a chat) agree on adding security to the title of P11.
Service Requirements
Marcus: Don't we say that resilience is a superset of survivability?
JPGS: yes
Marcus: So how can resilience and survivability be QoS properties.
JPGS: I think from the way your question and list below are structured, you mean that since survivability is a subset, why is is listed as a peer in P1? Although I'm admittedly not very consistant about it, there are times that I want to make sure that people know that survivability is included and so that survivability is captured. Would is be better if it was phrased "resilience (and therfore survivability)"?
Marcus: Maybe we can resolve this by structuring (more to add):
- Resilience
- Survivability
- Fault tolerance
- Survivability
- Security
- Confidentiality
- Authentication
- Integrity
- Nonrepudiation
- Performance
- Bandwidth
- Delay
- Jitter
JPGS: I like this, and this is consistant with Abdul's work; Abdul was going to also add some security definitions. Perhaps this structured list ought ot be on the definitions page with each one linked to the appropriate definition? I added a couple of entries.
Complexity
Abdul: Does complexity have a linear relationship with overall resilience. Can one increase the resilience of the system infinitely by allowing infinite increase in complexity? I don't think so.
Then this leads to the point that given that all the other factors are fixed, does complexity play any role in the resilience at all? What I am getting at is that complexity depends on/ derived from other factors in the system..it is not fundamental. It is one of the many measures of the system. In that case should it be included in the principles?
If we still want to include it, then one way of putting it is: Minimize Complexity: Achieve the minimum complexity that can still provide/maintain the maximum resilience possible with the given infrastructure.
JPGS: In fact this is the entire point: resilience adds complexity, but complexity may *reduce* resilience
Context Awareness and State Management
Daniel: Do these two principles need to be separate? As a node monitors its components, it would already have access to state information; consequently state can be managed in a decentralized manner.
JPGS: Clearly they are related in the sense that information is monitored for context management and is likely (but not necessarily) stored as state, but I think that the decisions on how to manage state in beyond just for context awareness are important enough that they ought to remain separate.
Justin P. Rohrer 16:25, 20 December 2006 (CST): If I am not mistaken, context awareness captures the state of factors which are not themselves part of the system (system defined very narrowly), whereas state management is the operation within and movement between operational modes of the system itself. For example a tcp flow uses a register to maintain window size information and that is part of its state management. If the tcp flow was somehow aware of the window size of OTHER flows which it shared resources with, that would be context awareness.
Metrics
Abdul: With respect to the line "Furthermore, metrics are needed to understand the impact of an adverse event or condition on the network service provided", I think we should add the phrase the fact that metrics are also needed to evaluate the effectiveness of resilience mechanisms on such network services.
Redundancy and Diversity
Abdul: I think there is some confusion between the two. If diverse alternative are simultaneously operational, isn't that same as being redundant. For example spacial diverse links when simultaneously operational (to defend..implying that they carry identical copies of data) is same as spacial redundancy. In this sense we can say that defense is achieved through redundancy and remediation through diversity.
Marucs: I miss "Implementation Diversity" meaning that I can use the same mechanism from to different manufactures to have a fail-over if one of the implementations is vulnerable to a specific challenge.
JPGS:: I agree and this will be added. This also covers not having a single operating system or switch vendor.
Therefore I suggest to add "Operational Diversity" (not sure about the name) on the same level as "Spatial/Temporal diversity" and put "Medium/Mechanism/Implementation Diversity" below this.
Adaptability
Abdul: Shouldn't adaptability be a subset of self-organizing and autonomic? Especially since we include self-optimizing, self-managing and self-repair, (characteritics of Adaptability) in the autonomic principle.
On other hand, are we trying to state that a node may adapt on its own (self-optimizing) or it could be externally asked to change certain parameters in order to adapt (adaptability)?
The same question applies to Evolvability as well.
JPGS: To the latter question, I think we are saying both, but I'm not sure they are fundamentally different. A node adapts based on some input, either sensed or signalled, and I think it would be hard to draw the line. As for the first issue, I think the problem like more in how we state P10 than this one.
Trust and Policy
Marcus: Trust and policy are not principles that help us with resilience but constrain our choice how to react. This is not reflected by the heading of this principle. This is basically the same problem as the "Complexity" heading which on its own may be perceived wrong. Though I do not have a proposal how to improve them.
JPGS: In this case it both constrains and helps; I've tried to make this explicit in the text. I agree that the title isn't ideal, but I don't have any good ideas. Perhaps we could do something like "Trust and Policies must be Considered" but I don't like that either.
New Principles
Marcus: After reading papers from the ANSA project (I will add them to the related work section soon) I like to highlight a few principles they came up with which we might have missed so far:
- "Seperation: Systems should be designed so that separation amongst their parts can be achieved; this means that they can be more flexibly configured. However, this can have the effect of introducing more components, thus reducing dependability."
- "Scalability: The dependability mechanisms used in a system must not impose constraints on scaling, and the extent to which it can be interconnected and its applications made to interwork. Scaling is about scaling up and down: mechanisms which are efficient in large systems should be designed so they are efficient in small systems or else should be replaceable by similar mechanisms which are efficient in small systems."
- "Transparency: A property of a system is transparent if application programmers need not be concerned with it. The aim of the ANSA work on dependability is to hide the details of the dependability mechanisms (but not the requirements for dependability) from the application programmer."
- "Concurrency: Concurrency is inevitable in distributed systems. This means that there is potential for conflicting, inconsistent changes to be made to data. Mechanisms are needed to prevent this."
Discussion
- Separation: We already covered the issue of complexity vs. resilience in our principles. But we should make the more explicit that our architecture composes the system from parts.
- Scaling: We could emphasis in the diversity principle, that different mechanisms or implementations of a single feature make our system scalable to various scenarios.
- Transparency: This is an interesting one! It would be intersting to investigate if can hide resilience from the application programmer or if we have to include the application programmer to build a resilient system.
- Concurrency: This is an addition to the heterogeneity principle opening a new dimension of the principle.
Please comment!!!