Overview
I study methods and build tools for making computers and computer programs more secure. Most recently, I have concentrated on the security of web browsers and web applications; these programs are particularly vulnerable due to the inherent insecurities of the open Internet, and to the proliferation of malicious web sites and malicious content republished at reputable sites.Much of my work focuses on building general defenses against classes of attacks, rather than fixing a single vulnerability in a single program. For example, I work on secure programming languages because they can give all programs written in the language protection against a class of threats.
Jump to: Computer security | Programming languages | Distributed computing
Computer security
Web application security
The number one reported class of security vulnerabilities today is script injection. It has affected all major sites (Yahoo, Google, MySpace, Facebook, etc.) and it occurs when a web site mistakenly publishes user-generated content that includes a malicious program (script). This is a pervasive problem on the web, since many web sites exist solely to publish user-generated content (blog postings, comments, home pages, photos, etc.).Working with colleagues at the University of Maryland, I showed that it is almost impossible for a web site, working alone, to prevent script injection. Instead, we proposed that web sites communicate to web browsers policies that tell which scripts are trusted by the site, and rely on the web browser to enforce the policies. Our method, called BEEP, can prevent script injection with 100% accuracy.
resources
BEEP home pageApplication-specific firewalls
The second largest class of security vulnerabilities being reported today are buffer overflows. These are severe vulnerabilities: an attacker can crash or even take over an application by exploiting a buffer overflow.I have built a tool called Yakker that protects applications against buffer overflows caused by malicious inputs. Yakker acts as an application-specific firewall: it examines the network protocol messages that are the inputs to network servers and filters out malformed messages as well as other messages that can trigger overflows.
The main novelty of Yakker is that it is highly automated, producing a simple firewall directly from existing human-readable specifications, namely, the “Request for Comments” documents that define most of the basic protocols of the Internet.
resources
Yakker is distributed as part of the Cyclone project.Trust management
A trust management system implements security policies that are constructed by composing the policies of multiple parties. Carl Gunter and I implemented QCM, the first trust management system based on a declarative query language; this is now the dominant approach to trust management systems. QCM was also the first distributed trust management system: it uses local policy information (e.g., local databases or locally available digital certicates) but can also retrieve remote policy information automatically, as necessary. Carl and I also showed how revocation of the certificates used in a trust management system could be expressed entirely within the same trust management system.I built a second trust management system, SD3, that extended QCM with recursive policies. SD3 was the first trust management system based on Datalog. SD3 featured a self-certifying evaluator, which produces machine-checkable proofs of its output, reducing the size of the trusted computing base significantly. Dan Suciu and I studied the theory of SD3, showing that it has many possible distributed evaluation strategies.
resources
QCM home pageSystem call monitoring
A system call monitor watches the sequence of system calls made by an application, looking for “abnormal” system calls. This can indicate that the application has been compromised, and is being controlled by an attacker. System call monitors are an old idea, but are not used often, because they can be slow, and because errors in the model of “abnormal” behavior can cause an administrative nightmare of false positives.Authenticated system calls address both of these drawbacks. First, they use a novel and very efficient means of enforcement: the notion of “abnormal” is embedded in the application itself, with cryptographic protections, and the application passes this in to the kernel with each system call. This eliminates the need for a separate (and slow) watchdog process. Second, they have no false positives because we define the notion of “abnormal” using a conservative static program analysis.
Our conference paper won a best student paper award.
resources
DNS
The Domain Name System (DNS) has well-known insecurities and there has been a more secure successor (DNSSEC) in the works for >10 years. However, DNSSEC has yet to be adopted, and we are starting to see real attacks on DNS (cache poisoning). Christof Fetzer and I examined some of the economic reasons for the failure of DNSSEC, and we proposed a scheme to “bootstrap” a more secure DNS from an existing public key directory (SSL).In 1998 I proposed that a system like DNS could be implemented entirely within a distributed trust management system like QCM; I reimplemented this in SD3, and this is now a common benchmark for trust management systems.
resources
SEDNS home pageProgramming languages
Cyclone
Safe programming languages have become mainstream: programmers are now using languages like Java, C#, and Javascript in preference to languages like C. The advantages of safety (no buffer overflows) and garbage collection (which enables safety and reduces development time) ensure that safe languages will only extend their dominance in the future.However, C remains essential in cases where efficiency is paramount, or where lack of computational resources, memory, or bandwidth rule out garbage collection and other techniques typically used in safe languages. In particular, core components of safe languages like garbage collectors and runtime systems are usually written in C. This is a dangerous situation, given the dismal security record of C programs with regards to buffer overflows.
The Cyclone project addresses this by retrofitting safety into the C language. Cyclone is a dialect of C that prevents unsafe programs (no buffer overflows) while retaining one of C's defining characteristics: control is in the hands of the programmer. For example, Cyclone allows manual memory management (alloc and free instead of garbage collection), but requires the programmer to add annotations to programs that Cyclone will check to ensure safety. It is possible to write provably-safe garbage collectors entirely in Cyclone.
resources
Cyclone home pageTyped assembly language and run-time code generation
Typed assembly language adds type annotations to machine code (binaries). These annotations can be mechanically checked and ensure properties like memory safety (no buffer overflows). Luke Hornof and I extended this idea to support run-time code generation: our machine code annotations permit us to write provably-safe programs that generate and execute provably-safe machine code at runtime.We later integrated our work more closely into the Popcorn compiler of Morrisett et al., and this collaboration led to our work on the Cyclone programming language.
resources
Type inference
I did a PhD thesis on problems in type inference. I was the first to point out the distinction between the properties of principal types and principal typings, and I showed that principal typings enable language features like “smartest recompilation,” incremental type inference, and accurate type error messages. Furthermore, I showed that the lack of principal typings is at the root of the undecidability of type inference for polymorphic recursion.Separately, I built a type inference algorithm based on polarity that can infer types well beyond those inferred by Hindley-Milner type inference (e.g., it infers types with universal quantification at arbitrary rank). The algorithm makes critical use of principal typings and is also an example of what Hosaya and Pierce call local type inference.
I have also worked with Jens Palsberg on type inference problems in object type systems.
resources
Standard ML of New Jersey
I was a member of the team that built the original Standard ML of New Jersey compiler. Andrew Appel and I built the first native back end for the compiler, which took took the (then) novel approach of a having a very large number of passes, each of which implemented a simple program transformation. We introduce the transformation to closure-passing style. We were also the first to implement a one-pass translation to continuation-passing style that avoids constructing so-called administrative redexes.
resources
SML/NJ home pageDenotational semantics
For my master's thesis I studied the problem of full abstraction in denotional semantics. Lack of full abstraction indicates a mismatch between a semantics and the programming language it intends to model. The basic finding is that Berry's stable domains cannot be a fully abstract model for any language defined by a simple class of term rewriting rules.
resources
Distributed computing
Trust management
I built three trust management systems (SD3 and two versions of QCM) that are essentially distributed query languages applied to a security problem. What distinguishes these distributed databases from previous work is that the distributed structure of the system is dynamically discovered by queries into the database itself, in the same way that hyperlinks are discovered by following links to web pages. The connectivity of the system is therefore not static as in previous work.
DXQ
I am working with Mary Fernandez on another distributed query language, DXQ, that combines ideas from QCM and SD3 with Mary's XML query language implementation, Galax. The focus of this work is on distributed computation rather than the security applications of SD3 and QCM.We are using DXQ to implement peer-to-peer systems for distributed resource management.
resources
DXQ home pageiMobile
resources
DOSE
I built a distributed database system called DOSE with Christof Fetzer. DOSE took a free, off-the-shelf database and deployed it on a cluster to create a cheap dependable distributed database—the RAID idea applied to databases instead of disk drives.
resources
SwitchWare
The SwitchWare project at the University of Pennsylvania built a network in which packets contain small programs that execute at each node they pass through. We used my QCM trust management system to implement security policies in the network.
resources
File synchronization
I implemented a file synchronizer called Unison with Benjamin Pierce. Unison supports bi-directional, cross-platform synchronization.Unison is widely used and is included with most Linux distributions.