North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Quantifying risk of waiting vs. upgrading for routervulnerabilities

  • From: Howard C. Berkowitz
  • Date: Mon Feb 21 10:54:22 2005

At 1:05 AM -0700 1/31/05, Pete Kruckenberg wrote:
After another long week of dealing with "upgrade now or die"
vulnerabilities, I'm wondering...

Is there data or analysis that would help me quantify the risks of
waiting (while I plan and evaluate and test) vs. doing immediate
software upgrades?

With many router vulnerabilities, exploits are in the wild within 24
hours. But how often are they used, and how often do they cause actual
network outages? There have been several major router vulnerabilities
during the last 2 years which have provided a reasonable data sample to
analyze. Can that data be used to create a more-accurate risk-analysis
model?

The risk of outage is very high (or certain) if I jump into upgrading
routers, and the quicker I do an upgrade, the more likely I am to have
a serious, extended outage. However, this is the only choice I have
absent information other than "every second gives the miscreants more
time to bring the network down."

If I delay doing the upgrade, using that delay to research and test
candidate versions, carefully deploy the upgrade, etc, I reduce the
risk of outage due to bad upgrades, at the expense of increasing the
risk of exploitation.

I'd love to find the "sweet spot" (if only generally, vaguely or by
rule-of-thumb), the theoretical maximum upgrade delay that will most
reduce the risks of upgrade outages while not dramatically increasing
the risks of exploitation outages.

Ideas? Pointers?

Pete.
Pete,

You touch on a broad area where I think there is data relevant to network operators, but they aren't aware of it: clinical medicine, more narrowly public health, and specifically epidemiology. What you describe is very much like the situation where there is a disease outbreak, and, perhaps only an experimental drug with which to treat it. How does one look at the risk versus reward tradeoff?

There are many medical approaches to considering the value of a drug or treatment -- this falls into the discipline, as well, of "evidence based medicine." There are assorted metrics for such things as "cost per year of life extension", and, more recently, "cost per year of quality life extension." These models include the cost of the treatment and both the probability of protection/improvement and of adverse effects. Adverse effects can range from a drug having no benefit but doing no harm, but precluding the use of a drug known to have some, but probably lesser efficacy -- or perhaps much more toxicity. The "clinician" has to assess the probability that the software or medical "bug fix" will kill both the bug and the patient.

It may be worthwhile to study the rather fascinating and time-sensitive problem faced every year, in coming up with the appropriate mixture of influenza substrains for that year's vaccine. The process is rather fascinating. Influenza strains initially classify by which of three H and two N factors are present in a given virus. There are substrains below, say, H3N2.

In general, the first of the new year's strains start in animals in Western China. They may mutate on their way into human form. There is a practical limit on how many strains can be put into the same batch of vaccine, and there is a lead time for vaccine production. Vaccine specialists, even ignoring things like this season's production disaster, have to make an informed guess what to tell the manufacturers to prepare, which may or may not match the viral strains clinically presenting in flu season.

There really are a number of applications of epidemiology to network operational security. In this community, we note the first appearances of malware and have informal alerting among NOCs and incident response teams, but I am unaware of anyone using the formal epidemiological/biostatistical methods of contact/first occurrence tracing. Applying some fairly simple methods to occurrence vs. time vs. location, for example, can reveal if there is one source of infection that infects one victim at a time, if there is contagion (different from infection) from victim to victim, etc. Indeed, some of the current work in early warning of biological warfare attack may have useful parallels to recognizing random infection versus an intelligently controlled BOTNET DDoS.

Howard