Silver Bullet Security Podcast 158 – Artem Dinaburg

View on Zencastr

On Episode 158 of the Silver Bullet Security Podcast, BIML’s Gary McGraw hosts Artem Dinaburg. Artem talks about using Agentic AI to find, fix, and exploit software security defects. We talk decompilation, stochasticism and the tension between harness development and LLM nondeterminism, human intuition, and the hard parts of program analysis. We also talk about MLsec (machine learning security), recursive pollution, and tokenization. Do we need more people in software security in the age of AI? Yes we do.

Transcription of episode 158

Click here to view/hide transcript

gem
This is the Silver Bullet Security Podcast with BIML. I’m your host, Gary McGraw, CEO of the Berryville Institute of Machine Learning and author of Software Security. This podcast series is sponsored by BIML, a nonprofit science and technology organization whose research focuses on machine learning security. For more, see barryvilleiml.com/podcast. This is the 158th in a series of interviews with security gurus, and I’m pleased to with me today, Artem Dinaburg. So thanks for joining us, Artem.

ARTEM DINABURG
Hi, Gary.

gem
Artem Dinaburg is the chief scientist at Trail of Bits, where he leads research and security engineering products for projects for government and commercial clients. Artem has worked on a large variety of projects, ranging from low-level software development to vulnerability research, reverse engineering, malware analysis, and program analysis/transformation. His core interests sit at the offensive and defensive intersection, automated vulnerability discovery, exploitation, and patching. These days, much of his attention is on a topic now familiar to the whole industry, how AI best can be used for automated bug finding, reverse engineering, and decompilation. Artem holds a BS in computer science from Penn State and an MS in computer science from Georgia Tech. So let’s start with the reality of software security automation. Over the last year, Trail of Bits completely overhauled its audit methodology, transitioning from widespread internal skepticism about AI to actively deploying an arsenal of dozens of specialized autonomous agents and hundreds of customized skills. On the right engagements, your AI augmented engineers are now uncovering upwards of 200 bugs a week. So from your perspective as chief scientist, what’s the fundamental difference between treating AI as a typing assistant versus weaponizing sort of a proactive autonomous agent loop to dismantle the code base?

ARTEM
Essentially you are delegating work, but you are delegating work to, not to humans, but to, as you said, autonomous agents. And these autonomous agents are very good in certain domains and they can cover a lot more ground than what you could as a human. There are certain categories of vulnerabilities. And that obviously, we’ve seen with the recent releases from Anthropic and so on that AI agents are just extremely good at finding. And if you can unleash this on a code base, you can cover a lot more ground very quickly and focus your human evaluators on assessing higher level properties that need to hold that may require domain expertise that an AI agent simply just does not have because it does not understand the business context where the software is running, the threat model that its creators are concerned about. And, maybe something that crosses different abstraction boundaries that you as a human can think more concretely about.

gem
Yeah, that makes sense. I mean, hopefully it will move the humans towards the design level in some sense, which has always been the hardest aspect of software security in my view.

ARTEM
Yes, I think you’re absolutely correct. As more and more mechanics of code auditing get automated, the bottleneck becomes other security skills, the ones in a basic intro to security course. Usually you emphasize compartmentalization, secure design, isolation, and then most of the things are kind of forgotten about. And everybody’s like, now here’s your first project.

What you struggle on is mechanically writing a lot of code. Well, now we’ve solved that problem and the bottleneck has become again, what is the proper security design for the problem you’re solving? What threats are you concerned about and how are you going to mitigate these issues?

gem
Cool. I love that. That’s a great answer. Much of your research focuses on using LLMs to reverse engineer and decompile binaries. When you ask a statistical model to reconstruct control flow graphs or guess variable types or label stripped functions, are you actually extracting the original architectural intent of the software? Or are we in some sense, generating a highly plausible semantic veneer that might make a human auditor’s job harder? Is there some of that or am I just crazy?

ARTEM
Oh, so I think this is a fascinating question and touches deep into a lot of what I’ve been thinking about, but I don’t necessarily have the data to completely validate my thoughts yet. So I guess to step back a bit, if you said translate it to a control flow graph and so on I think that this works for algorithmic-based reverse engineering and decompilation. This has worked great, and especially for the C family of languages. This has worked wonderfully. But if you notice, a lot of the current top tier automated reversing software or manually reversing software really wants to reverse everything to C. But this has not been the reality of the languages that people are developing in. If you try to reverse engineer Rust or Go or Swift or you one of the modern HLLs, you will get C that looks like it is pretending to be Rust, which I think is currently worse than looking at bad Rust.

gem
Believe me, I came up in the C-Tran days, so I know exactly what you’re talking about.

ARTEM
Yeah. And my hypothesis that we can do we can do much better. Let’s say like if the let’s talk about a language translation problem that we’re more familiar with. Let’s say you’re translating, English to Japanese or Japanese to English. You do not make a sentence diagram, reconstruct every sentence, every word verb, translate it to Japanese and rebuild your sentence. This is kind of how more traditional decompilation tools work. Yes, sometimes it’ll work. Occasionally, you’ll be able to get the message across. Maybe it’ll work for certain classes of phrases. But modern models, they have an enormous amount of Japanese-English pairings. And the machine model works its statistical magic, and out you get a translation that’s really good and idiomatic.

And why can we not do the same thing for effectively… Let’s say you were translating ARM64 to Go. You have the same thing, I know, because you don’t really need… you do not want to think about translating one function at a time or parts of a function and trying to recover the idiomatic Go language from that. This is impossible. This is an extremely difficult problem, which I’m not sure it’s even solved because that original Go is gone and you’re never getting it back, like it’s went through several layers of intermediate languages and outcomes, some binary, which has nothing to do with necessarily the language semantics that created it But, you could take this neural translation approach and actually get some kind of normal, Go or Rust or Swift, or what have you, back.

And I think this is very promising, especially because you have a compiler, so you can generate immense amounts of training data, and you have an oracle. You take the thing, it decompiles, and you build it, and you compare at the binary, and you have some level of binary similarity you can measure and hill climb. And so I think this is a very, very tractable problem for AI, ML and agentic based systems in general. And I think the reason it hasn’t been solved yet is because simply it needs a whole bunch of work to put into solve it. And then eventually, but eventually it’ll get solved. And I think it will be, much better than what humans currently do on HLLs. And it’ll help kind of ease the gap between source auditing and binary auditing, because you will be able to have some kind of quantifiable measurement with how close you are to the source code.

gem
You know, I think that’s interesting, but I also think there’s a real possibility we will one day realize we don’t really need that source code anymore, but we’ll see what comes first.

ARTEM
Also very possible.

gem
That’s too speculative to put anywhere but a podcast.

So, in traditional program analysis, we rely on sort of rigid compiler rules, deterministic oracles to prove that a bug exists. If the core engine of modern automation is a stochastic guessing machine, the agentic harness is supposed to, in some sense, be our rescue mechanism, the deterministic wrapper that validates and executes and grounds those probabilistic guesses. How do you design a harness that’s tight enough to enforce correctness on a non-deterministic model without completely suffocating the creative leap, or maybe more accurately, the stochastic generalization leap we want the AI to make in the first place?

ARTEM
I think that’s a fascinating question. And this is genuinely a difficult problem. If you do current AI-generated bug finding, and we see this in our audits and we’ve seen this from like complaints from maintainers of various open-star software and so on, like, you get a vast amount of false positives. It wants to find critical bugs everywhere. And you need some kind of gate that, as you said, grounds us in the real world.

Typically, there is no better proof than actually executing a live deployment of whatever software you are running against and seeing the issue come up in real time. Sometimes this is easy when you can prove that you’ve exfiltrated some kind a flag in a CTF style setup or there is a segmentation fault and you have an obvious memory card. But sometimes this is actually very difficult. Because maybe the issue is some kind of race condition that only triggers occasionally and just didn’t trigger it. Maybe the flaw is actually very difficult to understand.

gem
Exactly.

ARTEM
And I mean, currently right now, at this point, the solution is, throw it back to some humans who think really, really hard about it and try to determine whether this is indeed a problem. And I know…

gem
Well, at least we’ll have jobs.

ARTEM
Yeah, and well currently, that’s one of the things that is where we are, our minds are still… produce better results than AI. But I think there is no fundamental reason why you could not think about this at some deeper level or create some kind of tooling to help you think about this. We have lots of, there’s been lots and lots of work done modeling distributed systems and normal computation, and a lot of the reason why we don’t, nobody does verified programming to begin with is because it is extremely tedious. The development pace is really slow. And I know think there’s a very good argument to be made. Had we went down this path in the very beginning of programming, if let’s say, BASIC came with mandatory validation in the interpreter. Maybe if everybody look if if everybody learned to do this, maybe we would have been in a much different place off the architecture-wise and everything would have been a lot better.

But that’s not the way things ended up. And in in the reality that we have, you’re going to lose development velocity to somebody who just does not have these constraints. But with AI assistance, you don’t necessarily have to lose a velocity with these constraints. And in fact, forcing your AI to work in a much more verified development mode helps prevent slop. It helps ground what it is actually trying to do.

So this is a fantastic use for those vast amounts of verified programming. I think there’s lots of stuff from like Microsoft Research about this that all of a sudden has very renewed relevance. And now that the mechanics of writing code have been automated.

gem
Fantastic. So we’re all going to become formal methodists, which I’ve been trying to avoid my entire career. So hopefully I’ll retire properly before that.

ARTEM
Well, I mean, you ideally, you wouldn’t, sure, you wouldn’t you would need an understanding of how it works. Like, clearly, you will need an understanding of but you will not have to, check the arity of each function to make sure that, try to understand, you obscure 500 line long error messages about why your code doesn’t validate. Ideally, the AI does that.

gem
All right, let’s turn to the security of machine learning itself. When traditional security engineers look at machine learning, they usually try to map it into familiar mental models, like treating it as a standard application component with an unstructured API, or maybe as an esoteric pile of statistics. As we transition from hunting classic software bugs and flaws to auditing complex ML systems themselves, what’s the fundamental architectural shift that engineers miss? And why are we sort of failing to isolate data from control flow in our standard designs today?

ARTEM
Yes, again, another very fascinating question. So there’s there’s several layers to unpack here. The data control thing that you mentioned, I feel that every decade has its, we put our control in the data, starting with the phone system to cross-site scripting and so on.

Now we have another decade where we are going to do prompt injection, where we put control inside the data. I think there’s been some research to try to mitigate this to some degree where you have certain control tokens that you can reject at input time and the model says, once I see this control token, I will stop processing this. And you have some kind of basic pre-filter that does not allow control tokens in your input at all. But then again, you have this arms race of how can you sneak this control token back in, maybe you can work around this.

So I feel that we will be stuck with this for a while without and this is necessarily a good, clear solution.

The second aspect, how do we treat these systems? I think this is actually very fascinating in that with the first, not the first, some of the first places we’re going to see a lot of AI use by default is inside existing workflows. Almost by definition, because you have lots of existing systems, you are going to plug these things into an existing workflow and, with the obvious benefits of you are automating something that is very complicated, that previously could not be automated, and there is a bonafide business need to do this.

gem
Right.

ARTEM
there needs to be a very thorough reevaluation of the security model you’re doing this because you are going to have a thing that can take action and it is going to take untrusted input and it is going to be able to effectively, if right now you have to model it as, it can do anything that an attacker tells it to do, and you’ve modular whatever external controls you have and the latest example of this is the whole Meta Instagram support bot, where you could tell it to reset the password for somebody else’s account, and it would gladly do it because it isn’t its job to reset passwords. It is, I’m sure, saving a substantial amount of support costs for what are effectively free accounts. Business-wise, it makes sense. You wanted to have this functionality, but did not carefully model the fact that anybody can ask for somebody else’s password to be reset. And I think we’re going to see a lot more of this in various contexts because the business case for putting these agents into legacy systems is very, very high. Like you definitely want to be able to do this.

It’s not just that somebody wants to save some money by cutting employee costs. People want this because it’s legitimately very helpful. They help you accomplish the thing you want to accomplish 24 hours a day, seven days a week. There’s a bunch of studies that obviously these all come from companies that sell this, but I tend to believe they’re correct where people prefer AI-based support staff if they’re not told it’s AI based.

gem
Right, right. Because it’s more helpful. I mean, we’ve always had that tradeoff in security engineering between functionality and security. And in some sense, you know, the pressure on the functionality side is just a little bit higher now. So that balance is going to teeter totter, going to shift.

I want to get into something slightly more technical. so when When people talk about the data corruption feedback loop, they sometimes use the terms “recursive pollution,” that’s what we like at BIML, and sometimes they use “model collapse” in academia interchangeably as if the only thing that matters is the dramatic sort of in-state catastrophe. We’ve been looking closely at the distinction and we believe that a model doesn’t have to completely implode to become a liability. It just just needs to absorb enough of its own distorted output to degrade over time, often quietly. From an architectural defense perspective, how do we build an assurance framework to detect early stage recursive pollution before a system drifts into full blown collapse?

ARTEM
I think at some point, you have to have grounding in bona fide facts that exist outside of the model that you have validated somewhere else. This is even obvious when, not obvious, but this is like and something that exhibits itself whenever you are trying to do work doing let’s see no source analysis on one hand, but honestly, like any kind of AI use, if you do not have empirical facts to ground it in, it will hallucinate, it’ll make things up and as the model gets worse, it’ll hallucinate even more because it is continuing to operate.

gem
Well, and it it loves its own hallucinations, of course, because they came from it.

ARTEM
Yes, does. And so some kind of grounding is essential. And we see this in software security work, whereas this is kind of a symptom of, it’ll find high severity issues everywhere. And unless you have some kind of grounding in actual execution, where feasible or some kind of oracle that operates outside the model, you will find high security issues everywhere that don’t actually exist. Or maybe the code isn’t, like and a lot of times this is something we see, the code isn’t even wrong but it is actually impossible to reach the path because it requires two conflicting hardware versions to be operating at the same time, which never happens in the real world, but it is impossible. This is outside of the model’s ability. I think in the computer domain, this is at least models usually quite good because they have been trained and there’s a lot of reinforcement learning that happens on how actual computers work. And they’re very good at this. If you shift domains to other things, they actually get considerably worse because their intelligence is jagged.

One thing I encountered recently in my personal life is, if you ask the Mythos or even the Fable model how to get a 2003 Ford Crown Victoria out of park when the brake lights don’t come on, it will make something up that is completely not real. And then if you tell it, hey, this actually is not in the car, it will take you down a path where it has you disassembling the steering column. Whereas the real answer is on dozens of forum posts and in the official PDF manual from Ford that you can go and download and use. So there is a grounded answer, it just will refuse to use it and it will take you down a completely different path.

gem
I’m sorry to hear that this is a real. *laughs*

Most AI people treat tokenization as a mundane pre-processing step, completely overlooking its deeper architectural and philosophical impact. By slicing inputs into arbitrary statistical chunks rather than meaningful semantic units, tokenization fundamentally decouples the model’s emergent internal representation from the reality it’s supposed to be processing. This is kind of getting towards grounding a different way. From a white box security perspective, how severely does this translation layer distort our ability to reason about system behavior? And can we ever truly secure an architecture whose very ontology is built on these crystalline non-human representations?

ARTEM
That’s a very good question because tokenization has been, the root of evil for a lot of different problems. I think this is… a very, kind of a difficult problem because, I know there’s been lots of research about, for instance, doing some kind of byte-level understanding, usually aimed at doing, image processing or processing different kinds of binary structures. I feel that my, I’m going to give an answer, but if I always, I feel that my answer may be inadequate because there’s a lot of frontier research here that I’m not very familiar with. And I feel like even if I had read it, I would not produce a great answer. But, in the model training that I have attempted to do, I have run into the problem where, you have a token, the token stays under-trained because it’s not a pure enough in your training data. we saw this, with the back in, GPT-3, three and a half with, solid gold, Magikarp and so on. it’s this little, uh, trigger very, very strange results. I ran into doing some local model training for some specific task where I accidentally left some tokens under-train and I realized my performance was horrible. One of the lessons, build evaluations first and do training, then check to make sure you’re better than baseline. The reason was had under-trained tokens.

gem
Gosh, but that’s given development almost…

ARTEM
Yeah the reason was I had undertrained tokens. And I do not know what tokens and frontier models are under-trained, how much this is affecting hallucinations, how easy it is to trigger them, or what potentially the impacts are of leaving certain states unrepresentable because they cross token boundaries. or you you have this pair of tokens and you need a pair of tokens to represent something or maybe more than a pair. And, this particular pair has been not appearing next to each other very often. So it will would actually be very hard for the model to represent the thing you want.

I know I think there’s going to be certainly an emergent class of, I don’t know if going say attacks, but, things that cause undesired behavior in production AI agents that are possibly going to be based on, looking at this and figuring out what kind of, what can we insert that is going to make this difficult for, difficult to process.

gem
Yeah, I mean, it’s an interesting area of research for sure. And I think that it’s pretty much, you know, generally ignored by most people working in the field so far. We need to fix that.

ARTEM
Well, I think, yes. So one, I think people making models understand that tokenization is a very, very difficult subject and have dedicated a lot of research to it. But also you can’t really, the byte-level schemes have not produced the same kind of performance that actual tokenization does. and, you need to stay at the frontier and deliver performance, in terms of model capabilities to your end users.

In terms of security aspects, yes. I think there’s many, many things, and this is definitely one of them that we have not really gone into and that reinforces the need to, again, have a proper architecture, especially if you’re integrating one of these systems into an existing enterprise deployment of something. How are you going to put a perimeter around it and limit its actions and audit and validate its actions if you can’t limit them. So at least you can go in the past and undo them based on you what you wanted to do and what you allow it to do. And going back to what where we started, those very beginning security engineering principles of auditing, logging, isolation, privileges, now they have enormously renewed importance because the actions happen at machine speed instead of human speed. and you need to make sure that certain actions are simply impossible.

gem
I will point back to some work we did in 2020 at BIML where we took Saltzer and Schroeder’s principles and we adjusted them for machine learning. Check that out in all of your copious spare time. I want to kind of stick ourselves directly on the present, so into the immediate present. At at the Qualcomm Product Security Summit, you argued that we actually need more human software security people right now, not fewer, strictly because generative AI is allowing developers or whoever to churn out coded in unprecedented velocity, which I absolutely agree with. I used to joke that Gartner analysts could only understand one variable, bug cardinality, like whether that was going up or down while completely ignoring the two variables that actually matter, bugs per square inch and miles of new code. If the AI firehose is just massively amplifying the miles of code we have to deal with, how do human security teams scale to find the architectural flaws, the ones up the food chain, when we’re drowning in an ocean of AI-generated syntax, some of which we can automate away with other AI?

ARTEM
Yes, thank you. I guess I don’t want to say counterintuitive, since at least I can tell you agree with me, Gary. But it’s certainly counterintuitive to some people who say that security is over, why are you working in security? This has clearly got automated away. You don’t have anything…

gem
No way.

ARTEM
… you should stop, do something else. And I’m like, no. The amount of code being generated means that there is going to be more bugs than we have ever seen, even if we are very lucky to have that AI, it is good at bug finding because we’re going to need it to find all the bugs in AI-generated code.

What we have to focus on is, and this is going easier said than done, but the software development has to adjust to a much faster pace of code velocity. And this means gating as many tasks as we can behind some kind of no machine checked or machine audited analysis and only pushing to humans what there is uncertainty about or what may require some kind of higher, as you said, higher degree of abstraction. As to how to scale humans to be better at this higher degree of abstraction, I think, unfortunately, we have not found a way to accelerate human learning. So, that we have we have not found a way to do. But it’s essentially just going to mean that changing prioritization and the kind of things that people who are entering security now, the kind of things they learn, are you’re going to have to shift the focus. As you said previously in your security project, now you did a little bit of learning about policy, and then you had lots of learning about mechanically analyzing code and writing it to implement it. Well, we’re going to have to do a lot more of the learning about policy and how to assess this and how to interpret output from potentially automated systems and how to know when they are wrong and so on. We’re going to have to change how we train.

A lot of this ties into the fact, what are we going to do about junior people entering security? One, we’re going to need a lot more of them. Two, yeah I mean, they’re just simply not going to learn the same things that we learned. This is not necessarily a bad thing, fields evolve, but their knowledge is going to be valuable for where we are headed. And where we are headed has a much, much higher development velocity than we had before, requiring different strategies to keep up with the newer volume of code.

And I think we are seeing this, there’s been an uneven adoption of these principles. I think a lot of, like if you look at some of the outputs from a lot of the frontier labs, like Anthropic and OpenAI, essentially they’re saying, we use Codex or we use Claude Code to write 80, 90% of Codex and Claude Code. And we’ve increased our shipping velocity by 8-10 times in the past month. And, other places also do it. We have a lot of AI adoption internally and we can audit more lines of code and we can find more bugs per project faster. And eventually, everybody is going to have to adjust to higher velocity or be outcompeted by people who do. And this is going to require a shift change, both in software development and from perspective of people who do security, how you structure product and code security and what kind of things do you do. And yes, you’re definitely going to need more people who who can do this because there’s going to be so much more code. I’m sure you can remember the first time you wrote a computer program and the joy you get out of having the computer do something

gem
You’re not going to believe it: 1981.

ARTEM
Yeah, and like I’ve talked to a lot of our non-development staff, and they have been onboarded with Claude Code and Codex, and they get the same joy now because they can finally tell the computer what to do and the computer does it. And it is like you can like hear how awesome it is that they can like they had these things that they wanted the computer to do and now they can tell the computer to do it and the computer will do it for them and it is like it is an amazing feeling except now you don’t you don’t have to learn this wizard language and interpret these spells that the computer tells back at you can just like tell it to do stuff.

gem
I kind of want to I kind of push on that for the last question. So, you know, let’s close with where this is all heading, which is obviously agentic which we’ve been talking about the whole time. So, we’re rapidly building autonomous agents designed to write code, find vulnerabilities, make system level defense decisions all on their own. But, as we rush to build the mathematical harnesses and formalisms to keep those things from going off the rails, we shouldn’t lose sight of what actually makes a defender effective. Machines are great at scale, but humans bring what I call the three Is to the table: insight, intuition, and ingenuity. When we look past the hype of fully automated systems, how do we design high assurance architectures that don’t just constrain the AI, but actually amplifies those uniquely human traits?

ARTEM
So I think this is actually one where defense is going to get a lot of value since traditionally the problem with defense has been volume. There has been an immense problem with volume. Like you have sure you can log everything and lots of sensors, you can audit things, but you will get a large volume of data coming in. You will get lots of alerts from your entire network. And being able to, this is, it very quickly gets human cognitive overload. Even if you have an intense amount of filtering, if you have a large enough system, you get cognitive overload from what are false positives. And now we have the ability to where we can offload this cognitive load to somebody else, we can offload this cognitive load to a swarm of agents. And I mean obviously we can say, is this going to be turtles all the way down, but we can finally make sense… like, there is a thing that, sorry, I’m mumbling all over myself, but there is at least a vision for how we can take all of this vast amount of data and process it either in real time or at some delay offline and make sense of it and bubble up only the things that have been past several gates of review that a human needs to use their intuition to see, is this actually allowed per the business policy? And now you can…

gem
Right, but it also it also emphasizes the fact that that human intuition is really important. We can’t just get rid of that.

ARTEM
Yes, I know I think finally, like what you will be able to do is identify higher level policy violations from lots of lower level actions, which before took forensic analysts maybe months or years digging through things after an incident. Now you will be able to achieve this much faster because you actually have something that can look at all these sensors and correlate this data and actually intelligently make some kind of decision on it in either real time or close time, and bubble up something to a human to say, we have detected motion of data from here to there to there and finally exfiltrating out of VPN endpoint. Is this a permitted workflow? And somebody can say, yes, totally. Gary’s traveling to Tokyo to this week and this is absolutely expected. Or they can be like, no, no, no, no, this is a terrible proble. Press the red button.

gem
Well, this has been an absolutely fascinating conversation. Fantastic. Thanks so much. I can’t believe we’ve been talking for 35 minutes. It feels twenty-seven seconds

ARTEM
Yes.

gem
This has been the Silver Bullet Security Podcast with BIML. Silver Bullet is sponsored by the Berryville Institute of Machine Learning, a nonprofit science and technology organization whose research focuses on machine learning security. You can find a permanent archive of all of our episodes dating back to 2006 at garymcgraw/technology/silverbulletpodcast. Show links, notes, and an online discussion can be found on the Silver Bullet webpage at barryvilleiml.com/podcast. This is Gary McGraw.

Silver Bullet Security Podcast 158 – Artem Dinaburg

Transcription of episode 158

0 Comments

Leave a Reply