The idea that machine learning security is exclusively about “hackers,” “attacks,” or some other kinds of “adversary,” is misguided. This is the same sort of philosophy that misled software security into a myopic overfocus on penetration testing way back in the mid ’90s. Not that pen testing and red teaming are useless, mind you, but there is way more to security engineering that penetrate and patch. It took us forever (well, a decade or more) to get past the pen test puppy love and start building real tools to find actual security bugs in code.
That’s why the focus on Red Teaming AI coming out of the White House this summer was so distressing. On the one hand…OK, the White House said AI and Security in the same sentence; but on the other hand, hackers gonna hack us outta this problem…not so much.
This red teaming nonsense is worse than just a philosophy problem, it’s a technical issue too. Just take a look at this ridiculous piece of work from Anthropic.
Red Teaming Language Models to Reduce Harms:
Methods, Scaling Behaviors, and Lessons Learned
Red teaming sounds high tech, mysterious and steeped in hacker mystique, but today’s ML systems won’t benefit much from post facto pen testing. We must build security into AI systems from the very beginning (by paying way more attention to the enormous swaths of data used to train them and the risks these data carry). We can’t security test our way out of this corner, especially when it comes to the current generation of LLMs.
It’s tempting to pretend we can sprinkle some magic security dust on these systems after they are built, patch them into submission, or bolt special security apparatus on the side. Unfortunately the world well knows what happens when we pretend to be hard at work on security yet what we’re actually doing is more akin to squeezing our eyes shut and claiming to be invisible. Just ask yourself one simple question, who benefits from a security circus in this case?
AP reporter Frank Bajak covered BIML’s angle in this worldwide story August 13, 2023.