An unsafe legacy
August 14, 2015  

Now I’d like to continue my crusade for the adoption of safe programming languages by tackling some common objections, starting with the issue of legacy code.

Right now there is a huge installed base of unsafe C/C++ code in the world. Most of the CPU cycles that my laptop consumes are running C/C++ code. How could we possibly move all of this to code written in safe languages?

The answer is: DON’T DO IT.

It’s an accepted axiom of software development that you should (almost) never rewrite a program from scratch. There are very few successful examples of complete rewrites.

Robert Graham has a useful post on the security and code quality of BIND9. I agree with all of his criticisms (?!!), and in particular, this:

They shouldn’t rewrite it from scratch, but if they did, they should choose a safe language and not use C/C++.

That is, while I don’t think we can ever rewrite everything in safe languages, we should certainly choose a safe language whenever we rewrite something. And we should choose a safe language whenever we write something from scratch. That is, safe languages should be the default whenever we write new programs.

This does not solve the legacy problem—nothing solves the legacy problem. We are still running Cobol.

I ask simply that we do not add to the problem by producing new, unsafe programs.

New programs are written all the time, and we may have already reached a point at which most of those new programs are written in safe languages. I want to accelerate that.

The idea is to reach the point where the amount of new code written in safe languages is much greater than the amount of new code written in unsafe languages. Compound interest can then do its work. If the amount of new safe code is increasing by X% each year, and the amount of new unsafe code is increasing at a rate of Y%, and X>Y, then safe code eventually “wins.”

I do think that some rewriting makes sense, and this happens more than you might think. For example, Graham says

Last year, ISC (the organization that maintains BIND9) finished up their BIND10 project, which was to be a re-write of the code. This was a fiasco, of course. Rewrites of large software project are doomed to failure.

The thing is, BIND9 was a complete rewrite. For example, Colin Percival says

BIND 4 and BIND 8 were famous within the security community for their horrible track records. BIND 9, in an attempt to avoid all of the problems of earlier versions, was a complete ground-up rewrite — and it is still responsible for an astonishing number of security advisories.

(Both posts are worth reading in full, by the way.)

Something like a DNS server is small enough to write from scratch. People write little servers all the time. BIND itself would benefit from being smaller, as Graham says (technically this is not a rewrite, it is a redesign).

From a security perspective it is worth rewriting font parsers, device drivers, etc.—the little, self-contained software libraries that show up so often in vulnerability reports.

In short, I think that we can only deploy safe languages incrementally. Write new code in safe languages, not unsafe languages. Maintain programs written in unsafe languages as best we can. Make an effort to reimplement the most security critical libraries in safe languages whenever possible.