Gadi Evron in the House

From time to time, we enjoy inviting guests to participate in our regular Friday research group meetings. We try to do an in person meeting at least once a month, and love it when guests can join that way. Part of our mission at BIML is to spread the word about our views of machine learning security even among those who are working at the rock face.

Having just completed organizing [un]prompted (a labor of love that will result in a very interesting conference indeed), Gadi is steeped in the cybersecurity perspective of machine learning (as an offensive tool, a defensive tool, an attack surface, and an enterprise challenge). Of course we have our own BIML perspective on this, more focused on building security in than anything else.

Our meeting this week focused on tokenization first (an under-studied aspect of MLsec), and then tried to make sense of the absolute flood of stuff coming out of Anthropic these days. Bottom line?

  • There is lots more work to be done in tokenization
  • The C-compiler that Carlini tried to build with Claude is interesting, incomplete, and angled toward a reality check on the usual hyperbole. Good for Carlini for addressing the reality head on!
  • The Zero-day work (on the other hand) is hyperbolic, involving a breathless treatment of three well known and pretty boring attack pattern instances as applied in the face of blackbox fuzzing? We do acknowledge that automating exploit finding is a great thing to cover. Lets just do it without the razzle-dazzle.
  • Dario’s The Adolescence of Technology would better be described as the philosophy of an adolescent. Our main concern here is not counterfactualizing about AI apocalypse so much as how much of the real security conversation we need to have in MLsec gets ignored by this “look over there” kind of stuff.
  • We have lots more work to do to understand transformer circuits. You should look into it too. We must get into these networks and see what exactly they are doing INSIDE.

Anyway, it was great to have Gadi join us for the meeting and for a delightful lunch afterwards. This MLsec stuff is so fun.

Gadi Evron is Founder and CEO at Knostic, an AI security company, and chairs the ACoD cyber security conference. Previously, he founded (as CEO) Cymmetria (acquired), was CISO of the Israeli National Digital Authority, founded the Israeli CERT, and headed PwC’s Cyber Security Center of Excellence. He wrote the post-mortem analysis of the “First Internet War” (Estonia 2007), founded some of the first information-sharing groups (TH-Research, 1997, DA/MWP, 2004), wrote APT reports (Rocket Kitten – 2014, Patchwork – 2016, etc.), and the first paper on DNS DDoS Amplification Attacks (2006). Gadi has written two books on cybersecurity, is a frequent contributor to industry publications, and speaker at industry events, from Black Hat (2008, 2015) to Davos (2019) and CISO360 (2022).

Getting Inside the Network: Whitebox MLsec

We all know that WHAT machines like LLMs reflect the quality and security of everything in their WHAT pile (that is, their training set). We invent cutesy names like “hallucinate” to cover up being dangerously wrong. However, ignoring or soft pedaling risk is often not the best way forward. Real risk management is about understanding risk and adjusting strategy and tactics accordingly.

In order to do better risk management in MLsec, we need to understand what’s going on inside the network. Which nodes (and node groups) do what, what is the nature of representation inside the network, can we spot wrongness before it comes out? Better yet, can we compare networks and adjust networks from the inside before we adopt them?

These are the sorts of things that Starseer is looking into. At BIML we are bullish on this technical approach.

[un]prompted still too prompty

What happens when you organize a machine learning security conference together with a bunch of security experts who have widely varying degrees of machine learning experience? Fun and games!

The [un]prompted conference has a program committee reading like a who’s who of security, stretching from Bruce Schneier on one end to Halvar Flake on the other. BIML is proud and honored to have two people representing on the committee. (But we will say that we are legitimately surprised at how many people claim to have deep knowledge of machine learning security all lickety split like. Damn they must be fast readers.)

Ultimately all the experts had to slog through the 461 submissions, boiling the pile down to 25 or 30 actual talks. Did the law of averages descend in all its glory? Why yes, yes it did.

I have served on some impressive and diligent academic program committees over the decades (especially Usenix Security, NDSS, and Oakland). The [un]prompted approach is apparently more like Blackhat or DEFCON than that, with lots of inside baseball, big personalities, seemingly-arbitrary process, really smart people who actually do stuff, and much much more fun. And honestly the conference is going to be great—wide and deep and very real with a huge bias towards demos. ALL of the talks will be excellent.

I took it on myself to review everything submitted to my track (TRACK 1: Building Secure AI Systems) and also track 5 (TRACK 5: Strategy, Governance & Organizational Reality). Though I did get track 1 done (three times no less), I did not get through everything that came in during the deadline tidal wave. Lets just say A&A for Agents is over-subscribed and under-depth, prompt injection is the dead horse that still gets beaten, MCP and other operations fun at scale is the state of the practice, and wonky government types still like to talk about policy (wake me up when it’s over). If you want to see what’s next in building security in for ML, well it is only very slimly represented by two “lets get in there and see what the network is actually doing” proposals (one from Starseer and one from Realm labs). Yeah, submissions were “anonymous,” but everybody knows who is doing what at this end of the security field, so that’s just pretend.

Not only do we desperately need more whitebox work (leveraging the ideas behind transformer circuits you can find here), we also need to stop and think in MLsec. Where does recursive pollution (our #1 risk is BIML) fit in [un]prompted? Nowhere. How about model collapse? Nope. Data poisoning a la Carlini? Not even. Anything at all about data curation and cleaning (and its relationship to security)? Nah. Representation issues and security engineering? Well, there was one proposal about tokens…

Hats off to the outside–>in ops guys, they’re grabbing hold of the megaphone again! Just raw hacker sex appeal I guess.

Anyway, if you’re looking for a reason that BIML exists in all of our philosophical glory, it’s to peer as far into the MLsec future as possible. Somewhat ironically, we can do that by remembering the past. This [un]prompted experience feels so much like early software security (everyone was talking about buffer overflows in 1998 and penetration testing was an absolute wild west blast) that we can confidently predict MLsec is going to evolve from blackbox outside->in malicious input stuff, through intrusion detection, monitoring and sandboxing, eventually discovering that networks have lots of actual stuff you can try to make sense of inside the black box. Meanwhile the ops guys will paint a little number on each agentic ant, not thinking once about what the ant colony might be up to.

Do you remember when we decided to start looking at code to find bugs before it was even compiled? Because I do…it was my DARPA project. It will happen again. Not through static analysis…but through understanding just what the heck is going on INSIDE the networks we are building as fast as we can.

Recursive Pollution and Model Collapse Are Not the Same

Forever ago in 2020, we identified “looping” as one of the “raw data in the world” risks. See An Architectural Risk Analysis of Machine Learning Systems (January 20, 2020), where we said, “If we have learned only one thing about ML security over the last few months, it is that data play just as important role in ML system security as the learning algorithm and any technical deployment details. In fact, we’ll go out on a limb and state for the record that we believe data make up the most important aspects of a system to consider when it comes to securing an ML system.”

Here is how we presented the original risk back in 2020. Remember, this was well before GPT2 changed everything.

[raw:8:looping]

Model confounded by subtle feedback loops. If data output from the model are later used as input back into the same model, what happens? Note that this is rumored to have happened to Google translate in the early days when translations of pages made by the machine were used to train the machine itself. Hilarity ensued. To this day, Google restricts some translated search results through its own policies.

We were all excited when Ross Anderson and Ilia Shumailov “did the math” on the looping thing and wrote it up in this paper three years later:

Shumailov, Ilia, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson. “The Curse of Recursion: Training on Generated Data Makes Models Forget.” arXiv preprint arXiv:2305.17493 (2023). (Their paper was later published in Nature.)

In our BIML Bibiography entry, we call it, “a very easy to grasp discourse covering the math of eating your own tail. This is directly relevant to LLMs and the pollution of large datasets. We pointed out this risk in 2020. This is the math. Finally published in Nature vol 631.” In 2026, we still believe it is one of the top five papers in the field of machine learning security.

In the science world, this problem came to be known as “model collapse.” Honestly, we don’t care what it’s called as long as ML users are aware of the risk. See our original blog entry about all this here.

Four years later, we need to revisit our position. The problem is this. Discussion of model collapse focuses on END STATE conditions to the detriment of any focus on the pollution part itself. Your model does not have to completely collapse to become an unusable disaster. It can become a disaster enough through recursive pollution well before the model collapses. This is especially worrisome in light of Carlini’s recent work, Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples, which is also in our top 5.

We’ve been digging into the model collapse literature this January (2026), and we think it is time to clarify our view that recursive pollution is NOT model collapse…though it can LEAD to model collapse in the worst state.

Here’s our definition from the 2024 paper An Architectural Risk Analysis of Large Language Models (January 24, 2024) where we identified recursive pollution as the number one LLM risk. We have not changed our mind.

We have identified what we believe are the top ten LLM security risks. These risks come in two relatively distinct but equally significant flavors, both equally valid: some are risks associated with the intentional actions of an attacker; others are risks associated with an intrinsic design flaw. Intrinsic design flaws emerge when engineers with good intentions screw things up. Of course, attackers can also go after intrinsic design flaws complicating the situation.

[LLMtop10:1:recursive pollution]

LLMs can sometimes be spectacularly wrong, and confidently so. If and when LLM output is pumped back into the training data ocean (by reference to being put on the Internet, for example), a future LLM may end up being trained on these very same polluted data. This is one kind of “feedback loop” problem we identified and discussed in 2020. See, in particular, [BIML78 raw:8:looping], [BIML78 input:4:looped input], and [BIML78 output:7:looped output]. Shumilov et al, subsequently wrote an excellent paper on this phenomenon. Also see Alemohammad. Recursive pollution is a serious threat to LLM integrity. ML systems should not eat their own output just as mammals should not consume brains of their own species. See [raw:1:recursive pollution] and [output:8:looped output].

And just for completeness, here are the other two risk entries:
[raw:1:recursive pollution]

The number one risk in LLMs today is recursive pollution. This happens when an LLM model is trained on the open Internet (including errors and misinformation), creates content that is wrong, and then later eats that content when it (or another generation of models) is trained up again on a data ocean that includes its own pollution. Wrongness grows just like guitar feedback through an amp does. BIML identified this problem in 2020. See [BIML78 raw:8:looping], [LLMtop10:1:recursive pollution] and Shumailov.


[output:8:looped output]

See [BIML78 input:4:looped input]. If system output feeds back into the real world there is some risk that it may find its way back into input causing a feedback loop. This has come to be known as recursive pollution. See [LLMtop10:1:recursive pollution].

Anyway, expect to hear more from us on the recursive pollution front as we try to dig through the science and make sense of it all so you don’t have to.

And watch out for recursive pollution. It’s bad.

The Anthropic Copyright Settlement is Telling

I have written 12 books (not counting translations of particularly popular works), so I expected to find some works of mine on the Anthropic setllement website. Though I have known about this settlement action for a while now, I put off thinking about it until I got an official email from a law office just last week. That made me bite the bullet and go digging through the data pile.

You probably already know BIML’s distinction between HOW machines (normal computer programs) and WHAT machines (machines built by ML over an often immense WHAT pile). We talk all about this in our LLM risks report. Lots of risks are tied up in the very nature of the WHAT pile. Poison in the WHAT pile is bad. Racism, xenophobia, and sexism in the WHAT pile is bad. It turns out that one excellent approach to ML risk management is to compile a nice clean WHAT pile to train on.

Back to our Anthropic story. Much to my surprise, the Anthropic WHAT pile only had seven of my authored books on its list of stolen things—and it didn’t have the most famous of my works on it at all. Weird.

Is that good? Well, not really. You see, I would prefer that AI/ML systems encapsulated in LLMs would understand and incorporate the concepts in my very best work, Software Security. Apparently they don’t. I am torn about this as an author. I don’t want to see my stuff outright stolen and my copyrights infringed. But I also don’t want LLMs to be wrong and under-informed about software security—a field I helped to define from the very beginning.

It’s complicated, huh?

Gula Does BIML

Ron Gula, serial security entrepreneur, interviews vRon (an AI Agent) about BIML and BIML risks. This is fun.

Houston, we have a problem: Anthropic Rides an Artificial Wave

I’ll tip my hat to the new Constitution
Take a bow for the new revolution
Smile and grin at the change all around
Pick up my guitar and play
Just like yesterday
Then I’ll get on my knees and pray
We don’t get fooled again

Out there in the smoking rubble of the fourth estate, it is hard enough to cover cyber cyber. Imagine, then, piling on the AI bullshit. Can anybody cut through the haze? Apparently for the WSJ and the NY Times, the answer is no.

Yeah, it’s Anthropic again. This time writing a blog-post level document titled “Disrupting the first reported AI-orchestrated cyber espionage campaign” and getting the major tech press all wound around the axle about it.

The root of the problem here is that expertise in cyber cyber is rare AND expertise in AI/ML is rare…but expertise in both fields? Not only is it rare, but like hydrogen-7, which has a half-life of about 10^-24 seconds, it disappears pretty fast as both fields progress. Even superstar tech reporters can’t keep everything straight.

Lets start with the end. What question should the press have asked Anthropic about their latest security story? How about, “which parts of these attacks could ONLY be accomplished with agentic AI?” From our little perch at BIML, it looks like the answer is a resounding none.

Now that we know the ending, lets look at both sides of the beginning. Security first. Unfortunately, brute force, cloud-scale, turnkey software exploit is what has been driving the ransomware cybercrime wave for at least a decade now. All of the offensive security tool technology used by the attackers Anthropic describes is available as open source frameworks, leading experts like Kevin Beaumont to label the whole thing, “vibe usage of open source attack frameworks.” Would existing controls work against this? Apparently not for “a handful” of the thirty companies Anthropic claims were successfully attacked. LOL.

By now those of us old enough to know better than to call ourselves security experts have learned how to approach claims like the ones Anthropic is making skeptically. “Show me the logs,” we yell as we shake our canes in the air. Seriously. Where is the actual evidence? Who has seen it. Do we credulously repeat whatever security vendors tell us as it it is the gods’ honest truth? No we do not. Who was successfully attacked? Did the reporters chase them down? Who was on the list of 30?

AI second. It is all too easy to exaggerate claims in today’s superheated AI universe. One of the most trivial (and intellectually lazy) ways to do this is to use anthropomorphic language when we are describing what LLMs do. LLMs don’t “think” or “believe” or “have intentionality” like humans do. (FWIW, Anthropic is very much guilty of this and they are not getting any better.) LLMs do do a great job of role playing though. So dressing one up as a black hat nation state hacker and sending it lumbering off into the klieg lights is easy.

So who did it? How do we prove that beyond a reasonable doubt? Hilariously, the real attacks here appear to be asking an LLM to pretend to be a white hat red team member dressed in a Where’s Waldo shirt and weilding a SSRF attack. Wake me up when it’s over.

Ultimately, is this really the “first documented case of a cyberattack largely executed without human intervention at scale”…no, that was the script kiddies in the ’90s.

Lets be extremely clear here. Machine Learning Security is absolutely critical. We have lots of work to do. So lets ground ourselves in reality and get to it.

BIML granted official non-profit status

After an extensive year long process, the Berryville Institute of Machine Learning has been granted 501(c)3 status by the United States Internal Revenue Service. BIML is located at the foot of the Blue Ridge mountains on the banks of the Shenandoah river in Berryville, Virginia.

We are proud of the impact our work has made since we were founded in 2019, and we look forward to the wider engagement that non-profit status will allow us.

BIML in Brazil: mind the sec keynote

This Mind the Sec keynote was delivered on September 18th in São Paulo Brazil to an audience of several thousand attendees. The stage was set “in the round” which made delivery interesting. Mind the Sec is the largest information security conference in Latin America, with an audience of 16,000.

BIML in São Paulo

In addition to keynoting mind the sec, Dr. McGraw spoke at University São Paulo.

You can watch the talk (delivered to 180 USP graduate students) here.

The in person portion of the audience…