Echoes of the Morris Wake-up Call of 1988

Do you remember the Morris worm? Because we do. We watched it take the Internet by storm in 1988 when the net was small and mostly .edu sites connected with UUCP (there were only around 60,000 computers on the net those days). It was a big day in Net history and a watchman’s cry for the rising importance of computer security. Turns out that connected computers are subject to automated network-based attacks. Overnight, computer viruses escaped the sneaker net and grew wings.

Fast forward 38 years. Today there are 6 billion or so people on the Internet, often using multiple devices. And worms have evolved through SQL Slammer, Conficker, Stuxnet, and WannaCry—which all targeted exactly one bug—to Agentic AI controlled worms that grind on a target looking for ANY BUG. The viruses that grew wings in 1988 have developed relentless little brains.

This is Papernot at his best, reminding us why Machine Learning Security is crucially important. We’ll have a closer look this week and possibly revisit our annotated bibliography’s TOP 5.

Here is the abstract from the academic paper. We are tempted to call this new worm concept “Morris.”

A computer worm is malware that spreads on a network by replicating itself from one machine to another. Traditional worms, like WannaCry, exploited predetermined vulnerabilities, and their spread can be halted by patching those vulnerabilities. Here we show that artificial intelligence (AI) agents enable a fundamentally new threat: a worm that generates tailored attack strategies to each target it encounters. The worm parasitically uses compromised machines to run open-weight large language models (LLMs) to sustain its reasoning, or extend its reach for further attacks. Deployed on a network of machines spanning Linux, Windows, and IoT (Internet of Things) devices, the worm propagated by exploiting common, real-world corporate network vulnerabilities. Since the worm is powered by stolen compute, the attacker’s marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders. Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical. We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined not by fixed exploit code, but by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.

Thirty-eight years after 1988, we now have AI enabled malicious code leveraging the Trinity of Trouble with automated goal-driven intelligence for next to no cost. Expect things to change.

This story was broken in the New York Times by Cade Metz who provides an excellent story.

Patrick McDaniel BIML Site Visit

BIML is proud to host Patrick McDaniel, an OG of machine learning security (prominently featured in the BIML TOP 5) and a Dean of Research at Wisconsin, for a visit to the BIML Barn. Patrick arrived in Berryville late on Thursday and was greeted with a Liberal or two on the porch. We stayed up way too late talking about AI and security.

In the morning after breakfast, we spent much of the Friday research discussion going over our soon to be released paper No Security Meter for AI. Patrick has been thinking about measuring ML behavior for a long time, and was an early proponent of a whitebox approach. He had lots of very useful feedback for us.

Does science really get done around the kitchen table? Why yes. Yes it does. (And technical talks really get delivered in the BIML Barn.)

We ventured into greater metropolitan Berryville for lunch and coffee.

And then Patrick delivered a new talk as a BIML in the Barn feature to be released on May 13th. Patrick’s talk really surprised us and in very important philosophical ways.

After the talk we shared a cocktail on the patio. Maybelline is an honorary BIML dog.

Patrick enjoys a well-deserved Lemon Mint Fizz.

And then it was off to dinner with BIML spouses at Huntōn in Leesburg.

Fantastic visit. These kinds of human interaction are absolutely critical as we construct a reasonable approach to machine learning security.

BIML Featured in Fortune

https://fortune.com/2026/04/23/ai-cybersecurity-standards-mythos-nist-owasp-sans-cosai-dc-meeting-eye-on-ai/?sge456

Gary McGraw, cofounder of the Berryville Institute of Machine Learning, pointed to a core gap: Today’s benchmarks tend to measure how well AI systems can perform security tasks—not how secure the systems themselves are. Companies need to keep that distinction in mind when evaluating their tools and defenses.

McGraw warned as far back as 2019 that securing machine learning systems would be “one of the defining cybersecurity struggles of the next decade.” That moment has now arrived.

“These meetings are a way to remind ourselves of the fundamentals,” he said, “as we try to define what machine learning security actually is.”

BIML Debuts AI Security Measurement Work at NIST

What was to be a more standard copy of the BIML risk talk, instead was transformed into a debut of BIML’s forthcoming paper No Security Meter for AI. (expected mid-May) for an audience of NIST computer scientists.

It’s always fun to debut a talk for an audience that is engaged and knowledgeable.

While we were inside the very industrial Chemistry building for a talk that was 80% zoom, it rained outside.

Booting MOSAIC: multi-organization security and AI coalition

Well, maybe. (McGraw proposed the name which is being vetted.) We did all get together in Arlington 4.21.26 to discuss policy and AI. It was a good meeting set up by OWASP and SANS and run very professionally by Rob van der Veer.

The cool thing? BIML’s work was not only cited, but included.

The meeting setting was gorgeous.

As usual, the hall track was the best part of the entire day…especially when the hall was moved across the street to the bar.

Sounil Yu from Knostic and his son (a security analyst at Salesforce). Sounil discussed BIML’s measurement paper with McGraw.

See this coverage of the meeting: Global AI Security Standard Organizations Gather Under MOSAIC to Reduce Fragmentation, AI security leaders gather in Washington as risks mount—and Mythos raises the stakes

Too Dangerous to Release (Again): Software Security and AI

Have you heard? The mythos model from Anthropic is so dangerously good at finding software vulnerabilities that its release must be initially limited to companies participating in the Glasswing software security project! {Oh my. Also lions and tigers and bears!}

Does that sound like a marketing ploy to you? Because it does to most expert bug finders that I know best. In fact, the software exploit community (some of whom make a very good living selling bugs to the very companies that produced them…LOL) is pretty evenly split on this issue. So what is a grownup to think?

Those of who have been around the block a few times in AI-land remember way back when Chat-GPT2 was too dangerous to release too (because it could generate fake news even faster than a political PR flak). That garnered some press and helped with the launch for sure. Well, it’s happening again…just look at the tech headlines! Go, Anthropic, go!

Fortunately, there is some balanced coverage out there adopting a thoughtful approach (thanks, Cade). Here’s what we think:

  1. We still have a very real software security problem, so ANYTHING that helps people find AND FIX bugs in code is good. Everyone who is serious about software vulnerability has been using Agentic AI to do this better. You should too. Want to get started using AI to find bugs? Hold your nose (because LinkedIn) and check out this link. But please also figure out how to FIX the bugs you find. And don’t expect to be paid for slop.
  2. LLMs really are good at helping find easy vulnerabilities, but expert mode requires human experience and expertise. Will you become Halvar Flake by strapping on mythos? No, you will not.
  3. Building exploits that really work is much harder than just finding bugs. In fact, I wrote a whole book about this in 2004, 22 years ago, and it is still true. Patching is also harder than finding vulnerabilities. Hopefully AI will help with both of these software security activities.
  4. AI tools are all helpful in different ways. Use them all. Use the ones that are already released. (We hear tell that a well prompted Opus-4.6 (82%) does nearly as well as Mythos (84%) on CRSBench…which calls into question just what the hell these benchmarks measure—a topic we have been thinking about a bunch.)

As a last thought, we’re going to appeal to the four I’s that excellent human designers are familiar with: Intuition, Insight, and Inspiration (the fourth one is the “self” kind of I). AI is great and we love it. We are really going to need lots more software architects, information architects, designers, actual building architects, and humans who know what they are doing. If you know what you’re doing, you’ll be fine. If you are simply a bullshitter, you’re toast.

Why Whitebox Machine Learning Matters

Imagine that you are trying to practice good security engineering at the system level when one of your essential components is an unpredicatable black box that sometimes does the wrong thing. How do you ensure or even measure the trustworthiness of that system? That seems to be the current situation we are in with LLMs and Agentic AI.

One of the levers we are exploring is observability INSIDE the black box. SO, In the case of an LLM, that would be trying to figure out what is going on inside the Transformer. Are there circuits in the trained model that correlate with and define certain behaviors? Are there concepts in there? Can we make use of various activation patterns (and weights) or otherwise guide them from inside the network? Are there indicators of bad behavior? Can we see the “guidelines” imposed by alignment training? Are they robust? Etc.

This is what we call (for the moment anyway) “Whitebox Interpositioning” at BIML. It’s like watching your brain (and interposing inside it) while you are acting as part of a system. Maybe we can build an “Intention-ometer” or maybe not. But we are certainly moving toward “WHYness” in a WHAT machine.

This all reminds us of what happened in software security when we moved from black box monitoring and sandboxing to whitebox code analysis (static and dynamic both). Thing is, we never really got a handle on architecture, especially when it came to security…

Plenty of work to do on the raw science front…and something we want to create a coalition to approach. Toward that end, BIML recently hosted a whitebox summit with Realm Labs and Starseer. We were joined by Paul Kocher. Expect something to come of this.