MLsec Musings

  • Recently there have been several documents published as guides to security in machine learning. In October 2019, NIST published a draft called “A Taxonomy and Terminology of Adversarial Machine Learning”. Then in November, Microsoft published several interrelated webpages laying out a threat model for AI/ML systems and tying it to MS’s existing Software Development Lifecycle. We took a look at these documents to find out what they are trying to do, what they do well, and what they lack.

    T...

  • Community resources can be a double-edged sword; on the one hand, systems that have faced public scrutiny can benefit from the collective effort to break them. But nefarious individuals aren’t interested in publicizing the flaws they identify in open systems, and even large communities of developers have trouble resolving all of the flaws in such systems. Relying on publicly available information can expose your own system to risks, particularly if an attacker is able to identify similaritie...

  • ML systems rely on a number of possibly untrusted, external sources for both their data and their computation. Let’s take on data first. Mechanisms used to collect and process data for training and evaluation make an obvious target. Of course, ML engineers need to get their data somehow, and this necessarily invokes the question of trust. How does an ML system know it can trust the data it’s being fed? And, more generally, what can the system do to evaluate the collector’s trustworthiness? B...

  • Security is often about keeping secrets. Users don’t want their personal data leaked. Keys must be kept secret to avoid eavesdropping and tampering. Top-secret algorithms need to be protected from competitors. These kinds of requirements are almost always high on the list, but turn out to be far more difficult to meet than the average user may suspect.

    ML system engineers may want to keep the intricacies of their system secret, including the algorithm and model used, hyperparameter and co...

  • Privacy is tricky even when ML is not involved. ML makes things ever trickier by in some sense re-representing sensitive and/or confidential data inside of the machine.  This makes the original data “invisible” (at least to some users), but remember that the data are still in some sense “in there somewhere.”  So, for example, if you train a classifier on sensitive medical data and you don’t consider what will happen when an attacker tries to get those data back out through a set of sophistic...

  •  Keep It Simple, Stupid (often spelled out KISS) is good advice when it comes to security. Complex software (including most ML software) is at much greater risk of being inadequately implemented or poorly designed than simple software is, causing serious security challenges. Keeping software simple is necessary to avoid problems related to efficiency, maintainability, and of course, security.  But software is by its very nature complex.

    Machine Learning seems to defy KISS by its very natu...


  • The figure above shows how we choose to represent a generic ML system. Note that in our generic model, both processes and collections are treated as components. Processes are represented by ovals, whereas artifacts and collections of artifacts are represented as rectangles.

    The risk analysis of the generic ML system above uses a set of nine “components” to help categorize and explain risks found in various logical pieces.  Components can be either processes or collections. Just as under...

  • The principle of least privilege states that only the minimum access necessary to perform an operation should be granted, and that access should be granted only for the minimum amount of time necessary.[i]

    When you give out access to parts of a system, there is always some risk that the privileges associated with that access will be abused. For example, let’s say you are to go on vacation and you give a friend the key to your home, just to feed pets, collect mail, and so forth. Although y...

  • Even under ideal conditions, complex systems are bound to fail eventually. Failure is an unavoidable state that should always be planned for. From a security perspective, failure itself isn’t the problem so much as the tendency for many systems to exhibit insecure behavior when they fail.

    The best real-world example we know is one that bridges the real world and the electronic world—credit card authentication. Big credit card companies such as Visa and MasterCard spend lots of money on au...

  • The idea behind defense in depth is to manage risk with diverse defensive strategies, so that if one layer of defense turns out to be inadequate, another layer of defense hopefully prevents a full breach.

    Let’s go back to our example of bank security. Why is the typical bank more secure than the typical convenience store? Because there are many redundant security measures protecting the bank, and the more measures there are, the more secure the place is.

    Security cameras alone are a de...

  • Security people are quick to point out that security is like a chain.  And just as a chain is only as strong as the weakest link, an ML system is only as secure as its weakest component.  Want to anticipate where bad guys will attack your ML system?  Well, think through which part would be easiest to attack.

    ML systems are different from many other artifacts that we engineer because the data in ML are just as important (or sometimes even more important) than the learning mechanism itself....

  • BIML Security Principles

    gem

    25 July 2019

    Early work in security and privacy of ML has taken an “operations security” tack focused on securing an existing ML system and maintaining its data integrity. For example, Nicolas Papernot uses Salzter and Schroeder’s famous security principles to provide an operational perspective on ML security1. In our view, this work does not go far enough into ML design to satisfy our goals. Following Papernot, we directly address Salzter and Schroeder’s security principles as adapted in the book Buildi...

  • BIML in the news

    gem

    05 June 2019

    The Parallax covers BIML in an interview.

    READ ALL ABOUT IT

  • BIML art

    gem

    08 May 2019

    The exceptionally tasteful BIML logo was designed by Jackie McGraw. The logo incorporates both a yin/yang concept (huh, wonder where that comes from?) and a glyph that incorporates a B, and M, and an L in a clever way.

    Here is the glyph:

    The BIML glyph

    Here is my personal logo (seen all over, but most famously on the cover of Software Security:

    Gary McGraw’s logo (as seen on the cover of Software Security among other places)

    Here is the combined glyph plus yin/yang which ma...

  • BIML is Born

    gem

    07 May 2019

    Welcome to the BIML blog where we will (informally) write about MLsec, otherwise known as Machine Learning security. BIML is short for the Berryville Institute of Machine Learning. For what it’s worth, we think it is pretty amusing to have a “Berryville Institute” just like Santa Fe has the “Santa Fe Institute.” You go, Berryville!

    BIML was born when I retired from my job of 24 years in January 2019. Many years ago as a graduate student at Indiana University, I did lots of work in...