Another Round of “Adversarial Machine Learning” from NIST

The National Institute of Standards and Technology (aka NIST) recently released a paper enumerating many attacks relevant to AI system developers. With the seemingly-unending rise in costs incurred by cybercrime, it’s sensible to think through the means and motives behind these attacks. NIST provides good explanations of the history and context for a variety of AI attacks in a partially-organized laundry list. That’s a good thing. However, in our view, NIST’s taxonomy lacks a useful structure for thinking about and categorizing systemic AI risks. We released a simple (and hopefully more effective) taxonomy of ML attacks in 2019 that divides attacks into two types—extraction and manipulation—and further divides these types into the three most common attack surfaces found in all ML systems—the model, the (training) data, and the (runtime) inputs. That move yields a six category taxonomy.

But wait, there’s more… Attacks represent only a small portion of security risks present in AI systems. NIST’s attack taxonomy doesn’t have any room for serious (non-attack-related) concerns such as recursive pollution or improper use of AI technology for tasks it wasn’t designed for. Far too much of NIST’s evaluation of generative AI is dedicated to prompt injection attacks, where an attacker manipulates the prompt provided to the LLM at runtime, producing undesirable results. LLM developers certainly need to consider the potential for malicious prompts (or malicious input as computer security people have always called it), but this downplays a much more important risk—stochastic behavior from LLM foundation models can be wrong and bad all by itself without any clever prompting!

At BIML, we are chiefly concerned with building security in to ML systems—a fancy way of saying security engineering. By contrast, NIST’s approach encourages “red-teaming”, using teams of ethical hackers (or just people off the street and pizza delivery guys) to try to penetration test LLM systems based on chugging down a checklist of known problems. Adopting this “outside–>in” paradigm of build (a broken thing)-break-fix will inevitably overlook huge security chasms inside the system—holes that are ripe for attackers to exploit. Instead of trying to test your way toward security one little prompt at a time (which turns out to be insanely expensive), why not build systems properly in the first place through a comprehensive overview of systemic risks?!

In any case, we would like to see appropriate regulatory action to ensure that proper security engineering takes place (including, say, documenting exactly where those training data came from and what they contain). We don’t think enlisting an army of pizza guys providing prompts is the answer,In the meantime, AI systems are already being made available to the public, and they are already wreaking havoc. Consider, for example, the recent misuse of AI to suppress voter turnout in the New Hampshire presidential primary! This kind of thing should shock the conscience of any who believe AI security can be tested in as an afterthought. So we have a call to action for you. It is imperative that AI architects and thought leaders adopt a risk-driven approach to engineering secure systems before releasing them to the public.

Bottom line on the NIST attack list? Mostly harmless.

Another Round of “Adversarial Machine Learning” from NIST

0 Comments

Leave a Reply