Legacy C/C++ code is a nuclear waste nightmare that will make you WannaCry

June 11, 2017 ∞

One of the most important steps we can take to improve the state of computer security is to abandon the use of unsafe programming languages, namely, C and C++. Of all the possible efforts we could undertake, I consider this the biggest bang for the buck: the most effective measure as well as the easiest.

I get a lot of pushback from people who think I am saying that it would be “easy.” That’s not what I am saying; I’m saying that it is the easiest effective measure that we can take. It is a difficult step, but it is much less difficult than all of the other possible effective steps we might take.

Few in the security community agree with me. They think that we should not bother considering safe languages because they do nothing to help us with the mountain of insecure legacy C/C++ code that we all rely on. They claim that a mountain of insecure code with some magic safety dust on top won’t make us more secure.

That’s a specious objection, and here’s why.

WannaCry

A month ago, the “WannaCry” ransomware outbreak made the front pages of the world’s newspapers, due to a combination of good branding, the relative novelty of ransomware, the notoriety of Bitcoin, the use of an NSA exploit/vulnerability (EternalBlue), prominent victims (the National Health Service in the UK), the fact that it operates as a self-spreading worm, and the purported involvement of North Korea, Russia, the Shadow Brokers, etc.

The key detail for us is that the vulnerability that WannaCry exploited was in unsafe legacy C/C++ code, dating back at least to the days of Microsoft Windows XP. At first, this code was thought to make only old, unsupported installations of Windows XP vulnerable, but it was later found to affect newer versions of Windows as well.

What you will notice in all of this is that legacy C/C++ figures prominently but safe languages are not mentioned at all.

Insecure legacy C/C++ is a problem for C/C++

If you rely on legacy C/C++ code, you are going to get burned.

If you write new, secure C/C++ code that relies on insecure legacy C/C++ libraries, you are going to get burned.

If you write new, secure C/C++ programs that run on a legacy C/C++ operating system, you are going to get burned.

In other words, legacy C/C++ code is a problem for all of us, whether we are writing new code in safe languages or unsafe languages like C and C++.

In other words, the problem of legacy C/C++ code is mostly irrelevant to the question of whether to write new code in safe languages or unsafe languages.

And if you think that the existence of insecure legacy C/C++ code is a reason to write more insecure C/C++ code, or to use safe languages less, then you are not a very strong thinker.

The nuclear waste problem of legacy C/C++

Legacy code, like nuclear waste, can stick around for generations. We have a couple of ways of handling it.

Do nothing. This is the traditional method, and it leads to WannaCry.
Sandboxing. Nowadays this means containers, process isolation, virtual machines, hypervisors, etc.
Mitigation. This means techniques such as ASLR, CFI.
Reimplementation. This is the supposedly crazy method that I am advocating.

Mainstream security thinking rejects (1) as incompatible with security.

Mainstream security thinking also rejects (4) as impractical, and instead focuses on (2) and (3).

In reality, what is happening with legacy C/C++ is mainly (1) and a little bit of (4). As an industry, we are mostly ignoring the problem, except when something like WannaCry happens. At that point, the vendor, Microsoft in the case of WannaCry, scrambles to reimplement the buggy software by producing a patch.

Yes, this is actually reimplementation, on a small scale.

When we are talking about legacy code, it is probably code that was written by programmers who don’t even work at the company any more. Even if the same programmers are still around, they probably don’t remember much about the buggy code they wrote years ago.

Producing a patch is a lot of effort. You have to debug the code, figure out what conditions cause the bug, and produce a new version of the code that prevents the exploit. Furthermore, you have to make sure that your new code doesn’t break the mountain of code that depends on it. That’s quite a bit of high-stakes software development, done on deadline.

Reimplementation is also happening even in cases (2) and (3). First of all, (2) and (3) are not perfect (I’d argue that they are much less effective than reimplementing in a safe language). There is still the potential of a WannaCry in sandboxed/mitigated programs. And even if the outbreak is contained, when you find a serious bug that triggers the sandbox or mitigation, you generally still want to fix it.

Note as well that sandboxing and mitigation are not easy to deploy. You can’t just take legacy code and sprinkle on sandboxing and mitigation; that can break the code. And mitigation in particular makes things harder to debug. Just applying (2) and (3) can force you to do some reimplemention.

Make it better not worse

Legacy C/C++ is full of security holes, regardless of whether you write new code in C/C++ or in a safe language. The difference is that if you use C/C++ to write new code or reimplement the legacy code, you are making the mountain bigger. That doesn’t make any sense.

Safe languages for the win! People are rewriting basic libraries in safe languages. People are writing operating systems, hypervisors, and large programs in safe languages. Most code nowadays is written in safe languages. The problem of legacy C/C++ will be with us for a long time but we are making progress and we need to stop taking backwards steps. No new C/C++!

Previously

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29.