Harnessing Alien Intelligence

We are now harnessing the “harness” metaphor to describe how to work effectively with generative AI. Generative AI applications often first appear familiarly “intelligent” but with use typically reveal themselves not to be so, with practical implications. Combining stochastic models with deterministic tools and code loops can change that, this state of affairs is typically called a harness. The “harness” concept (and implementations) can be used: to clarify questions about AI capabilities, to create dramatic jumps in performance in a range of challenging computations, and is related to (perhaps caused by) averting various LLMs risks we have identified in previous work (paper).

Harnesses

Harnessing alien intelligences is a hallmark achievement of humanity. Harnessing a horse, brings together a rider and horse to enable heavy laden travel over difficult terrain, or feats of speed, agility, and competition unavailable to each. For example, this last weekend was the latest edition of the Kentucky Derby. We can also harness multiple horses to a cart, bulls to a plow, dogs to a sled. Each scenario is distinct. Different goals and context details impact how the harness is shaped and operated. Common to the scenarios however is that there must be a sufficient fit of the harness to the animal and task, and a degree of domestication of the animal, which entails sufficient capacity for communication or mutual understanding. Again, the detailed capabilities of the partners and desired outcomes shape harnessing. It’s hard not to get lost considering this idea and the clever details of implementation of this phenomenon. And this even though the cross-species examples can fail to evoke the prime example of social existence, which harnesses the alien (to each other) intelligences of other humans through shared concepts, norms, and laws to achieve among other things the device I am writing this on and the network that this message travels through.

Metaphors and abstraction aside, we are now learning (or paying attention to the fact) that effective use of generative models, strongly depends on the scaffolding, the harness that we build around these. In fact, important questions that we ask about these models, such as how “intelligent” they may be are misconstrued outside the concept of a harness. This is one among many ideas discussed in an analysis from Farrell et al “Large AI models are cultural and social technologies” (paper). The paper place recent generative models in the context of the history of information and collaboration technologies.

Harnessing has been here all along, but sometimes it’s hard to see the what’s too close, “this is water”. The chat interface is a harness, so are the sampling or decoding algorithms, the recent agentic scaffolds are harnesses and finally also by name. Model weights in and of themselves are some kind of data extract, created from a pile of data sent through the digital distillery of training protocols and algorithms, but alone they do no work, make no decisions, are not intelligent. Seen as a kind of library some may be more extensive than others, but inert nevertheless.

It took for the harnesses to get big and complex for us to begin to see the concept. But also the results have come to be known under various names. Most famously we now have a menagerie of coding “agents”, the talking horses that will replace human software developers. Understanding efforts in the neighborhood of harnessing and agents is an area we are closely looking at. This is a brief look at three recent techniques that have stood out.

Recursive Language Models

The first is “Recursive Language Models” (RLM) (paper). The technique is motivated as a response to the challenges of very long context computation where attention becomes very resource intensive and yet we still experience “context rot”. Context rot means that task performance decays or collapses the context gets larger. The risk of assuming that simply longer context is better when using LLMs was among the risks in our original assessment. The RLM, described as new “inference paradigm” (not a harness) puts the context into a REPL environment and asks the LM to generate code and new prompts to  “programmatically examine, decompose, and recursively call itself over snippets”. Intermediate results are stored as variables in the REPL workspace environment and the final state goes into a named variable, signaling completion. This is the harness, and it engages the model in a particular set of behaviors (through code and prompt generation) in an environment defined by an arbitrarily large provided prompt. The result is effective handling of prompts multiple orders of magnitude larger with dramatic outperformance in a set of relevant tasks. Incorporating this harness tiny models can perform as “vanilla” frontier models! We often talk about WHAT and HOW machines, and identified a top “misuse” risk as trying to approach any computational task using auto-regressive text completion. The RLM approach uses code generation and execution in a REPL to move away from this with great impact. Along the way RLM also mitigates risks related to representational transparency and black-box behavior by executing inspectable code and storing intermediate results in its environment! It also turns out that these traces can be put to further use. 

AutoHarness

The second example is “AutoHarness: improving LLM agents by automatically synthesizing a code harness” (paper), twice in the title. This effort is motivated by the failure of language models, even frontier ones, to repeatedly generate valid moves in game environments. Again the word environment, its use here more direct, the formal world of a game. A familiar observation here is the presence of Potemkin understanding (paper), a model may describe or apparently understand the rules of a game but cannot reliably generate legal moves. The authors describe a harness as “the glue or plumbing between the bowel and the task that needs to be solved” and talk about harness use in two ways. One way, by asking the model to generate validation code for proposed game moves, and then interleaving the generated validator with auto-regressive answering, they call this “harness as verifier”. The second approach is to ask the model to generate code that produces valid moves directly, which they call “harness as policy”. Both approaches attempt to constrain the behavior of the model to conform to the game rules, and in this way “harness” the behavior. But there is also a third form, the overarching “AutoHarness” strategy itself. Performance of the harnessed strategies again greatly outperforms, even vanilla frontier models. In the case of harness as policy, where we ask the WHAT machine to create a little HOW machine, outperformance shows up both in task performance and dramatically in compute cost. 

Meta-Harness

Full recognition of the centrality of the harness shows up in “Meta-Harness: End-to-End Optimization of Model Harnesses” (paper). The goal here is to use a strong coding model called a proposer to create new task-specific harnesses by examining the code, execution, and performance of other harnesses. This is one where the traces described before are put to good use. Interestingly the task-specific may be executed by a weaker model. This method is capable of dramatically improve upon existing harnesses with limited compute, in ways that transfer across models that execute the harness, and across a range of tasks. 

In each of these cases small models can vastly outperform “vanilla” frontier models, when they are simply harnessed models through a sampling algorithm. Meta-Harness can improve upon harnesses similar to RLM, hybrids of HOW and WHAT machines and beat hand coded approaches. Omar Khattab, one of the authors here is also an author in RLM.

There is something fundamental happening here, with implications for performance, computational costs, interpretability, and risk management. We are also seeing these patterns in other work, so we will continue to revisit. 

Patrick McDaniel BIML Site Visit

BIML is proud to host Patrick McDaniel, an OG of machine learning security (prominently featured in the BIML TOP 5) and a Dean of Research at Wisconsin, for a visit to the BIML Barn. Patrick arrived in Berryville late on Thursday and was greeted with a Liberal or two on the porch. We stayed up way too late talking about AI and security.

In the morning after breakfast, we spent much of the Friday research discussion going over our soon to be released paper No Security Meter for AI. Patrick has been thinking about measuring ML behavior for a long time, and was an early proponent of a whitebox approach. He had lots of very useful feedback for us.

Does science really get done around the kitchen table? Why yes. Yes it does. (And technical talks really get delivered in the BIML Barn.)

We ventured into greater metropolitan Berryville for lunch and coffee.

And then Patrick delivered a new talk as a BIML in the Barn feature to be released on May 13th. Patrick’s talk really surprised us and in very important philosophical ways.

After the talk we shared a cocktail on the patio. Maybelline is an honorary BIML dog.

Patrick enjoys a well-deserved Lemon Mint Fizz.

And then it was off to dinner with BIML spouses at Huntōn in Leesburg.

Fantastic visit. These kinds of human interaction are absolutely critical as we construct a reasonable approach to machine learning security.

BIML Featured in Fortune

https://fortune.com/2026/04/23/ai-cybersecurity-standards-mythos-nist-owasp-sans-cosai-dc-meeting-eye-on-ai/?sge456

Gary McGraw, cofounder of the Berryville Institute of Machine Learning, pointed to a core gap: Today’s benchmarks tend to measure how well AI systems can perform security tasks—not how secure the systems themselves are. Companies need to keep that distinction in mind when evaluating their tools and defenses.

McGraw warned as far back as 2019 that securing machine learning systems would be “one of the defining cybersecurity struggles of the next decade.” That moment has now arrived.

“These meetings are a way to remind ourselves of the fundamentals,” he said, “as we try to define what machine learning security actually is.”

BIML Debuts AI Security Measurement Work at NIST

What was to be a more standard copy of the BIML risk talk, instead was transformed into a debut of BIML’s forthcoming paper No Security Meter for AI. (expected mid-May) for an audience of NIST computer scientists.

It’s always fun to debut a talk for an audience that is engaged and knowledgeable.

While we were inside the very industrial Chemistry building for a talk that was 80% zoom, it rained outside.

Booting MOSAIC: multi-organization security and AI coalition

Well, maybe. (McGraw proposed the name which is being vetted.) We did all get together in Arlington 4.21.26 to discuss policy and AI. It was a good meeting set up by OWASP and SANS and run very professionally by Rob van der Veer.

The cool thing? BIML’s work was not only cited, but included.

The meeting setting was gorgeous.

As usual, the hall track was the best part of the entire day…especially when the hall was moved across the street to the bar.

Sounil Yu from Knostic and his son (a security analyst at Salesforce). Sounil discussed BIML’s measurement paper with McGraw.

See this coverage of the meeting: Global AI Security Standard Organizations Gather Under MOSAIC to Reduce Fragmentation, AI security leaders gather in Washington as risks mount—and Mythos raises the stakes

Too Dangerous to Release (Again): Software Security and AI

Have you heard? The mythos model from Anthropic is so dangerously good at finding software vulnerabilities that its release must be initially limited to companies participating in the Glasswing software security project! {Oh my. Also lions and tigers and bears!}

Does that sound like a marketing ploy to you? Because it does to most expert bug finders that I know best. In fact, the software exploit community (some of whom make a very good living selling bugs to the very companies that produced them…LOL) is pretty evenly split on this issue. So what is a grownup to think?

Those of who have been around the block a few times in AI-land remember way back when Chat-GPT2 was too dangerous to release too (because it could generate fake news even faster than a political PR flak). That garnered some press and helped with the launch for sure. Well, it’s happening again…just look at the tech headlines! Go, Anthropic, go!

Fortunately, there is some balanced coverage out there adopting a thoughtful approach (thanks, Cade). Here’s what we think:

  1. We still have a very real software security problem, so ANYTHING that helps people find AND FIX bugs in code is good. Everyone who is serious about software vulnerability has been using Agentic AI to do this better. You should too. Want to get started using AI to find bugs? Hold your nose (because LinkedIn) and check out this link. But please also figure out how to FIX the bugs you find. And don’t expect to be paid for slop.
  2. LLMs really are good at helping find easy vulnerabilities, but expert mode requires human experience and expertise. Will you become Halvar Flake by strapping on mythos? No, you will not.
  3. Building exploits that really work is much harder than just finding bugs. In fact, I wrote a whole book about this in 2004, 22 years ago, and it is still true. Patching is also harder than finding vulnerabilities. Hopefully AI will help with both of these software security activities.
  4. AI tools are all helpful in different ways. Use them all. Use the ones that are already released. (We hear tell that a well prompted Opus-4.6 (82%) does nearly as well as Mythos (84%) on CRSBench…which calls into question just what the hell these benchmarks measure—a topic we have been thinking about a bunch.)

As a last thought, we’re going to appeal to the four I’s that excellent human designers are familiar with: Intuition, Insight, and Inspiration (the fourth one is the “self” kind of I). AI is great and we love it. We are really going to need lots more software architects, information architects, designers, actual building architects, and humans who know what they are doing. If you know what you’re doing, you’ll be fine. If you are simply a bullshitter, you’re toast.