Another Round of “Adversarial Machine Learning” from NIST

The National Institute of Standards and Technology (aka NIST) recently released a paper enumerating many attacks relevant to AI system developers. With the seemingly-unending rise in costs incurred by cybercrime, it’s sensible to think through the means and motives behind these attacks. NIST provides good explanations of the history and context for a variety of AI attacks in a partially-organized laundry list. That’s a good thing. However, in our view, NIST’s taxonomy lacks a useful structure for thinking about and categorizing systemic AI risks. We released a simple (and hopefully more effective) taxonomy of ML attacks in 2019 that divides attacks into two types—extraction and manipulation—and further divides these types into the three most common attack surfaces found in all ML systems—the model, the (training) data, and the (runtime) inputs. That move yields a six category taxonomy.

But wait, there’s more… Attacks represent only a small portion of security risks present in AI systems. NIST’s attack taxonomy doesn’t have any room for serious (non-attack-related) concerns such as recursive pollution or improper use of AI technology for tasks it wasn’t designed for. Far too much of NIST’s evaluation of generative AI is dedicated to prompt injection attacks, where an attacker manipulates the prompt provided to the LLM at runtime, producing undesirable results. LLM developers certainly need to consider the potential for malicious prompts (or malicious input as computer security people have always called it), but this downplays a much more important risk—stochastic behavior from LLM foundation models can be wrong and bad all by itself without any clever prompting!

At BIML, we are chiefly concerned with building security in to ML systems—a fancy way of saying security engineering. By contrast, NIST’s approach encourages “red-teaming”, using teams of ethical hackers (or just people off the street and pizza delivery guys) to try to penetration test LLM systems based on chugging down a checklist of known problems. Adopting this “outside–>in” paradigm of build (a broken thing)-break-fix will inevitably overlook huge security chasms inside the system—holes that are ripe for attackers to exploit. Instead of trying to test your way toward security one little prompt at a time (which turns out to be insanely expensive), why not build systems properly in the first place through a comprehensive overview of systemic risks?!

In any case, we would like to see appropriate regulatory action to ensure that proper security engineering takes place (including, say, documenting exactly where those training data came from and what they contain). We don’t think enlisting an army of pizza guys providing prompts is the answer,In the meantime, AI systems are already being made available to the public, and they are already wreaking havoc. Consider, for example, the recent misuse of AI to suppress voter turnout in the New Hampshire presidential primary! This kind of thing should shock the conscience of any who believe AI security can be tested in as an afterthought. So we have a call to action for you. It is imperative that AI architects and thought leaders adopt a risk-driven approach to engineering secure systems before releasing them to the public.

Bottom line on the NIST attack list? Mostly harmless.

Use Your Community Resources [Principle 10]

Community resources can be a double-edged sword; on the one hand, systems that have faced public scrutiny can benefit from the collective effort to break them. But nefarious individuals aren’t interested in publicizing the flaws they identify in open systems, and even large communities of developers have trouble resolving all of the flaws in such systems. Relying on publicly available information can expose your own system to risks, particularly if an attacker is able to identify similarities between your system and public ones.  

Transfer learning is a particularly relevant issue to ML systems. While transfer learning has demonstrated success in applying the learned knowledge of an ML system to other problems, knowledge of the base model can sometimes be used to attack the student [wang18]. In a more general sense, the use of publicly available models and hyperparameters could expose ML systems to particular attacks. How do engineers know that a model they use wasn’t deliberately made public for this very purpose? 

Public datasets used to train ML algorithms are another important concern. Engineers need to take care to validate the authenticity and quality of any public datasets they use, especially when that data could have been manipulated by unknown parties.  At the core of these concerns is the matter of trust; if the community can be trusted to effectively promote the security of their tools, models, and data, then community resources can be hesitantly used. Otherwise, it would be better to avoid exposing systems to unnecessary risk. After all, security problems in widely-used open-source projects have been known to persist for years, and in some cases decades, before the community finally took notice.

Read the rest of the principles here.

Be Reluctant to Trust [Principle 9]

ML systems rely on a number of possibly untrusted, external sources for both their data and their computation. Let’s take on data first. Mechanisms used to collect and process data for training and evaluation make an obvious target. Of course, ML engineers need to get their data somehow, and this necessarily invokes the question of trust. How does an ML system know it can trust the data it’s being fed? And, more generally, what can the system do to evaluate the collector’s trustworthiness? Blindly trusting sources of information would expose the system to security risks and must be avoided.

Next, let’s turn to external sources of computation. External tools such as TensorFlow, Kubeflow, and pip can be evaluated based on the security expertise of their engineers, time-proven resilience to attacks, and their own reliance on further external tools, among other metrics. Nonetheless, it would be a mistake to assume that any external tool is infallible. Systems need to extend as little trust as possible, in the spirit of compartmentalization, to minimize the capabilities of threats operating through external tools.

It can help to think of the various components of an ML system as extending trust to one another; dataset assembly could trust the data collectors’ organization of the data, or it could build safeguards to ensure normalization. The inference algorithm could trust the model’s obfuscation of training data, or it could avoid responding to queries that are designed to extract sensitive information. Sometimes it’s more practical to trust certain properties of the data, or various components, but in the interests of secure design only a minimum amount of trust should be afforded. Building more security into each component makes attacks much more difficult to successfully orchestrate.

Read the rest of the principles here.

Remember that Hiding Secrets is Hard [Principle 8]

Security is often about keeping secrets. Users don’t want their personal data leaked. Keys must be kept secret to avoid eavesdropping and tampering. Top-secret algorithms need to be protected from competitors. These kinds of requirements are almost always high on the list, but turn out to be far more difficult to meet than the average user may suspect.

ML system engineers may want to keep the intricacies of their system secret, including the algorithm and model used, hyperparameter and configuration values, and other details concerning how the system trains and performs. Maintaining a level of secrecy is a sound strategy for improving the security of the system, but it should not be the only mechanism.

Past research in transfer learning has demonstrated the ability for new ML systems to be trained from existing ones. If transfer learning is known to have been applied, it may facilitate extraction of the proprietary layers trained “on top” of the base model. Even when the base model is not known, distillation attacks allow an attacker to copy the possibly proprietary behavior of a model using only the ability to query the ML system externally.  As a result, maintaining the secrecy of the system’s design requires more than simply not making the system public knowledge.

A chief concern for ML systems is protecting the confidentiality of training data. Some may attempt to “anonymize” the data used and consider that sufficient. As the government of Australia discovered in 2017, great care must be taken in determining that the data cannot be deanonymized1. Neural networks similarly provide a layer of anonymization by transforming confidential information into weights, but even those weights can be vulnerable to advanced information extraction techniques. It’s up to system engineers to identify the risks inherent in their system and design protection mechanisms that minimize security exposure. 

Keeping secrets is hard, and it is almost always a source of security risk.

Read the rest of the principles here.


1. Culnane, Chris, Benjamin Rubinstein, Vanessa Teague. “Understanding the Maths is Crucial for Protecting Privacy.” Technical Report from Department of Computing and Information Systems, University of Melbourne. (Published Sept 29, 2016; Accessed Oct 28, 2019.)

Promote Privacy [Principle 7]

Privacy is tricky even when ML is not involved. ML makes things ever trickier by in some sense re-representing sensitive and/or confidential data inside of the machine.  This makes the original data “invisible” (at least to some users), but remember that the data are still in some sense “in there somewhere.”  So, for example, if you train a classifier on sensitive medical data and you don’t consider what will happen when an attacker tries to get those data back out through a set of sophisticated queries, you may not be doing your job.

When it comes to sensitive data, one promising approach in privacy-preserving ML is differential privacy. The idea behind differential privacy is to set up privacy restrictions that, for example, guarantee that an individual patient’s private medical data never has too much influence on a dataset or on a trained ML system.  The idea is to in some sense “hide in plain sight” with a goal of ensuring that anything that can be learned about an individual from the released information, can also be learned without that individual’s data being included.  An algorithm is differentially private if an observer examining the output is not able to determine whether a specific individual’s information was used in the computation.  Differential privacy is achieved through the use of random noise that is generated according to a chosen distribution and is used to perturb a true answer.  Somewhat counterintuitively, because of its use of noise, differential privacy can also be used to combat overfitting in some ML situations.  Differential privacy is a reasonably promising line of research that can in some cases provide for privacy protection.

      Privacy also applies to the behavior of a trained-up ML system in operation.  We’ve discussed the tradeoffs associated with providing (or not providing) confidence scores.  Sometimes that’s a great idea, and sometimes it’s not. Figuring out the impact on system security that providing confidence scores will have is another decision that should be explicitly considered and documented.

      In short, you will do well to spend some cycles thinking about privacy in your ML system while you’re thinking about security.

Read the rest of the principles here.

Keep it Simple [Principle 6]

 Keep It Simple, Stupid (often spelled out KISS) is good advice when it comes to security. Complex software (including most ML software) is at much greater risk of being inadequately implemented or poorly designed than simple software is, causing serious security challenges. Keeping software simple is necessary to avoid problems related to efficiency, maintainability, and of course, security.  But software is by its very nature complex.

Machine Learning seems to defy KISS by its very nature. ML models involve complex mathematics that is often poorly understood by implementers. ML frequently relies on huge amounts of data that can’t possibly be fully understood and vetted by system engineers. As a result, many ML systems are vulnerable to numerous attacks arising from complexity. It is important for implementers of ML systems to recognize the drawbacks of using complex ML algorithms and to build security controls around them. Adding controls to an already complex system may seem to run counter to our simplicity goal, but sometimes security demands more. Striking a balance between achieving defense-in-depth and simplicity, for example, is a tricky task.

KISS should help inform ML algorithm selection. What makes an adequate algorithm varies according to the goals and requirements of the system, yet there are often multiple choices. When such a choice needs to be made, it is important to consider not only the accuracy claims made by designers of the algorithm, but also how well the algorithm itself is understood by engineers and the broader research community. If the engineers developing the ML system don’t deeply understand the underlying algorithm they are using, they are more likely to miss security problems that arise during operations. This doesn’t necessarily mean that the latest and greatest algorithms can’t be used, but rather that engineers need to be cognizant of the amount of time and effort it takes to understand and then build upon complex systems.

Read the rest of the principles here.

Compartmentalize [Principle 5]


The figure above shows how we choose to represent a generic ML system. Note that in our generic model, both processes and collections are treated as components. Processes are represented by ovals, whereas artifacts and collections of artifacts are represented as rectangles.

The risk analysis of the generic ML system above uses a set of nine “components” to help categorize and explain risks found in various logical pieces.  Components can be either processes or collections. Just as understanding a system is easier when a system is divided up into pieces, controlling security risk is easier when the pieces themselves are each secured separately.  Another way of thinking about this is to compare old fashioned “monolithic” software design to “micro-services” design.  In general, both understanding and securing a monolith is much harder than securing a set of services (of course things get tricky when services interact in time, but we’ll ignore that for now).  In the end we want to eradicate the monolith and use compartmentalization as our friend.

      Lets imagine one security principle and see how compartmentalization can help us think it through.  Part of the challenge of applying the principle of least privilege in practice (described above) has to do with component size and scope.  When building blocks are logically separated and structured, applying the principle of least privilege to each component is much more straightforward than it would be otherwise.  Smaller components should by and large require less privilege than complete systems.  Does this component involve pre-processed training data that will directly impact system learning?  Hmm, better secure those data!

     The basic idea behind compartmentalization is to minimize the amount of damage that can be done to a system by breaking up the system into a number of units and isolating processes or data that carry security privilege. This same principle explains why submarines are built with many different chambers, each separately sealed. If a breach in the hull causes one chamber to fill with water, the other chambers are not affected. The rest of the ship can keep its integrity, and people can survive by making their way to parts of the submarine that are not flooded.

      The challenge with security and compartmentalization comes when it is time to consider the system as a whole.  As we’ve seen in our generic ML system here, data flows between components, and sometimes those data are security sensitive.  When implementing an ML system, considering component risks is a good start, but don’t forget to think through the risks of the system as a whole.  Harkening back to the principle of least privilege, don’t forget to apply the same sort of thinking to the system as a whole after you have completed working on the components.

Read the rest of the principles here.

Follow the Principle of Least Privilege [Principle 4]

The principle of least privilege states that only the minimum access necessary to perform an operation should be granted, and that access should be granted only for the minimum amount of time necessary.[i]

When you give out access to parts of a system, there is always some risk that the privileges associated with that access will be abused. For example, let’s say you are to go on vacation and you give a friend the key to your home, just to feed pets, collect mail, and so forth. Although you may trust the friend, there is always the possibility that there will be a party in your house without your consent, or that something else will happen that you don’t like. Regardless of whether you trust your friend, there’s really no need to put yourself at risk by giving more access than necessary. For example, if you don’t have pets, but only need a friend to pick up the mail on occasion, you should relinquish only the mailbox key. Although your friend may find a good way to abuse that privilege, at least you don’t have to worry about the possibility of additional abuse. If you give out the house key unnecessarily, all that changes.

Similarly, if you do get a house-sitter while you’re on vacation, you aren’t likely to let that person keep your keys when you’re not on vacation. If you do, you’re setting yourself up for additional risk. Whenever a key to your house is out of your control, there’s a risk of that key getting duplicated. If there’s a key outside your control, and you’re not home, then there’s the risk that the key is being used to enter your house. Any length of time that someone has your key and is not being supervised by you constitutes a window of time in which you are vulnerable to an attack. You want to keep such windows of vulnerability as short as possible to minimize your risks.

In an ML system, we most likely want to control access around lifecycle phases. In the training phase, the system may have access to lots of possibly sensitive training data.  Assuming an offline model (where training is not continuous), after the training phase is complete, the system should no longer require access to those data. (As we discussed when we were talking defense in depth, system engineers need to understand that in some sense all of the confidential data are now represented in the trained-up ML system and may be subject to ML-specific attacks.)

Thinking about access control in an ML is useful and can be applied through the lens of the principle of least privilege, particularly between lifecycle phases and system components. Users of an ML system are not likely to need access to training data and test data, so don’t give it to them. In fact, users may only require black box API access to a running system.  If that’s the case, then provide only what is necessary in order to preserve security.

Less is more when it comes to the principle of least privilege. Limit data exposure to those components that require it and then for as short a time period as possible.

Read the rest of the principles here.


[i]Saltzer, Jerome H., and Michael D. Schroeder. “The protection of information in computer systems.” Proceedings of the IEEE63, no. 9 (1975): 1278-1308.

Fail Securely [Principle 3]

Even under ideal conditions, complex systems are bound to fail eventually. Failure is an unavoidable state that should always be planned for. From a security perspective, failure itself isn’t the problem so much as the tendency for many systems to exhibit insecure behavior when they fail.

The best real-world example we know is one that bridges the real world and the electronic world—credit card authentication. Big credit card companies such as Visa and MasterCard spend lots of money on authentication technologies to prevent credit card fraud. Most notably, whenever you go into a store and make a purchase, the vendor swipes your card through a device that calls up the credit card company. The credit card company checks to see if the card is known to be stolen. More amazingly, the credit card company analyzes the requested purchase in context of your recent purchases and compares the patterns to the overall spending trends. If their engine senses anything suspicious, the transaction is denied. (Sometimes the trend analysis is performed off-line and the owner of a suspect card gets a call later.)

This scheme appears to be remarkably impressive from a security point of view; that is, until you note what happens when something goes wrong. What happens if the line to the credit card company is down? Is the vendor required to say, “I’m sorry, our phone line is down”? No. The credit card company still gives out manual machines that take an imprint of your card, which the vendor can send to the credit card company for reimbursement later. An attacker need only cut the phone line before ducking into a 7-11.

ML systems are particularly complicated (what with all that dependence on data) and are prone to fail in new and spectacular ways. Consider a system that is meant to classify/categorize its input.  Reporting an inaccurate classification seems like not such a bad thing to do. But in some cases, simply reporting what an ML system got wrongcan lead to a security vulnerability. As it turns out, attackers can exploit misclassification to create adversarial examples[i], or use a collection of errors en masse to ferret out sensitive and/or confidential information used to train the model.i In general, ML systems would do well toavoid transmitting low-confidence classification results to untrusted users in order to defend against these attacks, but of course that seriously constrains the usual engineering approach.

Classification results should only be provided when the system is confident they are correct. In the case of either a failure or a low confidence result, care must be taken that feedback from the model to a malicious user can’t be exploited. Note that many ML models are capable of providing confidence levels along with their output to address some of these risks. That certainly helps when it comes to understanding the classifier itself, but it doesn’t really address information exploit or leakage (both of which are more challenging problems). ML system engineers should carefully consider the sensitivity of their systems’ predictions and take into account the amount of trust they afford the user when deciding what to report.

If your ML system has to fail, make sure that it fails securely.

Read the rest of the principles here.


[i]Gilmer, Justi, Ryan P. Adams, Ian Goodfellow, David Andersen, and George E. Dahl. “Motivating the Rules of the Game for Adversarial Example Research.” arXiv preprint 1807.06732 (2018)

Practice Defense in Depth [Principle 2]

The idea behind defense in depth is to manage risk with diverse defensive strategies, so that if one layer of defense turns out to be inadequate, another layer of defense hopefully prevents a full breach.

Let’s go back to our example of bank security. Why is the typical bank more secure than the typical convenience store? Because there are many redundant security measures protecting the bank, and the more measures there are, the more secure the place is.

Security cameras alone are a deterrent for some. But if people don’t care about the cameras, then a security guard is there to defend the bank physically with a gun. Two security guards provide even more protection. But if both security guards get shot by masked bandits, then at least there’s still a wall of bulletproof glass and electronically locked doors to protect the tellers from the robbers. Of course if the robbers happen to kick in the doors, or guess the code for the door, at least they can only get at the teller registers, because the bank has a vault protecting the really valuable stuff. Hopefully, the vault is protected by several locks and cannot be opened without two individuals who are rarely at the bank at the same time. And as for the teller registers, they can be protected by having dye-emitting bills stored at the bottom, for distribution during a robbery.

Of course, having all these security measures does not ensure that the bank is never successfully robbed. Bank robberies do happen, even at banks with this much security. Nonetheless, it’s pretty obvious that the sum total of all these defenses results in a far more effective security system than any one defense alone.

The defense-in-depth principle may seem somewhat contradictory to the “secure-the-weakest-link” principle because we are essentially saying that defenses taken as a whole can be stronger than the weakest link. How­ever, there is no contradiction. The principle “secure the weakest link” applies when components have security functionality that does not overlap. But when it comes to redundant security measures, it is indeed possible that the sum protection offered is far greater than the protection offered by any single component.

ML systems are constructedout ofnumerous components.  And, as we pointed out multiple times above, the data are often the most important thing from a security perspective. This means that bad actors haveasmany opportunities to exploitan ML system as there are components, and then some. Each and every component comes with a set of risks, and each and every one needs to address those risks head on.  But wait, there’s more. Defense in depth teaches that vulnerabilities not addressed (or attacks not covered) by one component should, in principle, be caught by another. In some cases a risk may be controlled “upstream” and in others “downstream.”  

Lets consider anexample: a given ML system designmay attempt to secure sensitive training data behind some kind of authentication and authorization system, only allowing the model access to the data while it is actually training. While this may well bea reasonable and well-justified practice, it is by no means sufficient to ensure that no sensitive information in the dataset can be leaked through malicious misuse/abuse of the system as a whole. Some ML models are vulnerable to leaking sensitive information via carefully selected queries made to the operating model.[i] In other cases, lots of know-how in “learned” form may be leaked through a transfer attack.[ii] Maintaining a history of queries made by users, and preventing subsequent queries that together could be used to divine sensitive information can serve as an additional defensive layer that protects against these kinds of attack.  

Practicing defense in depth naturally involves applying the principle of least privilegeto users and operations engineers of an ML system. Identifying and preventing security exploits is much easier when every component limits its accessto only theresources it actually requires.  In this case, identifying and separating components in a design can help, because components become natural trust boundaries where controls can be put in place and policies enforced.

Defense in depth is especially powerful when each component works in concert with the others.

Read the rest of the principles here.


[i]M. Fredrikson, S. Jha, and T. Ristenpart, “Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures,” Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, 2015, pp. 1322–1333.

[ii]B. Wang, Y. Yao, B. Viswanath, H. Zheng, and B. Y. Zhao, “With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning,” 27th USENIX Security Symposium, 2018, pp. 1281–1297.