Uncategorized | Page 3 of 10 | Berryville Institute of Machine Learning

BIML: The Scandanavian Tour

10 May 2024

gem

No comments

Categories: Uncategorized

Dr. McGraw recently visited Stockholm, Oslo, and Bergen, hosting events in all three cities.

In Stockholm, a video interview was added in addition to a live breakfast presentation. Here are some pictures of the presenter’s view of the video shoot.

Reactions were scary!

The talk in Oslo was packed, with lots of BIML friends in the audience.

Bergen had a great turnout too, with a very interactive audience including academics from the university.

Here’s the best slide from the Bergen talk.

If your organization would like to host a BIML talk, please get in touch.

Indiana University SPICE Talk

12 April 2024

gem

2 Comments

Categories: Uncategorized

BIML’s work was featured in a April 5th talk at the Luddy Center for Artificial Intelligence, part of Indiana University.

Here is the talk abstract. If you or your organization are interested in hosting this talk, please let us know.

10, 23, 81 — Stacking up the LLM Risks: Applied Machine Learning Security

I present the results of an architectural risk analysis (ARA) of large language models (LLMs), guided by an understanding of standard machine learning (ML) risks previously identified by BIML in 2020. After a brief level-set, I cover the top 10 LLM risks, then detail 23 black box LLM foundation model risks screaming out for regulation, finally providing a bird’s eye view of all 81 LLM risks BIML identified. BIML’s first work, published in January 2020 presented an in-depth ARA of a generic machine learning process model, identifying 78 risks. In this talk, I consider a more specific type of machine learning use case—large language models—and report the results of a detailed ARA of LLMs. This ARA serves two purposes: 1) it shows how our original BIML-78 can be adapted to a more particular ML use case, and 2) it provides a detailed accounting of LLM risks. At BIML, we are interested in “building security in” to ML systems from a security engineering perspective. Securing a modern LLM system (even if what’s under scrutiny is only an application involving LLM technology) must involve diving into the engineering and design of the specific LLM system itself. This ARA is intended to make that kind of detailed work easier and more consistent by providing a baseline and a set of risks to consider.

Tech Target Podcast: BIML Discusses 23 Black Box LLM Foundation Model Risks

1 April 2024

gem

No comments

Categories: Uncategorized

A recently-released podcast features a in-depth discussion of BIML’s recent LLM Risk Analysis, defining terms in easy to understand fashion. We cover what exactly a RISK IS, whether open source LLMs make any sense, how big BIG DATA really is, and more.

Have a listen here https://targetingai.podbean.com/e/security-bias-risks-are-inherent-in-genai-black-box-models/

BIML Featured on TheInsurerTV

19 March 2024

gem

No comments

Categories: Uncategorized

To watch the video on TheInsurereTV website, click here.

https://www.theinsurertv.com/news-in-focus/researcher-warns-insurers-using-genai-and-llms-to-beware-feedback-loop-of-wrongness/

Rik Farrow Interviews McGraw for login;

15 March 2024

gem

No comments

Categories: Uncategorized

https://www.usenix.org/publications/loginonline/interview-gary-mcgraw-0

This wide ranging interview starts with a brief history lesson and dives deep into BIML’s LLM Risk Analysis. Have a read, pass it on, and most importantly read the report.

BIML LLM Risk Analysis Debuted at NDSS’24

6 March 2024

gem

No comments

Categories: Uncategorized

The first public presentation of BIML’s LLM work was presented in San Diego February 26th as an invited talk for three conference workshops (simultaneously). The workshops coincided with NDSS. All NDSS ’24 workshops: https://www.ndss-symposium.org/ndss2024/co-located-events/

This was the first public presentation of the BIML LLM Top Ten Risks list since its publication.

BIML’s Gary McGraw Featured on Security Ledger Podcast

21 February 2024

gem

No comments

Categories: Uncategorized

Have a listen as Paul Roberts digs deep into BIML’s work on machine learning security. What exactly is data feudalism? Why does it matter? What are the biggest risks associated with LLMs?

See the BIML LLM risk analysis (released 1.24.24 under the creative commons).

Or watch the podcast on Youtube…

When ML goes wrong, who pays the price?

15 February 2024

gem

No comments

Categories: Uncategorized

Air Canada is learning the hard way that when YOUR chatbot on YOUR website is wrong, YOU pay the price. This is as it should be. This story from CTV News is a great development.

BIML warned about this in our LLM Risk Analysis report published 1.24.24. In particular, see:

[LLMtop10:9:model trustworthiness] Generative models, including LLMs, include output sampling algorithms by their very design. Both input (in the form of slippery natural language prompts) and generated output (also in the form of natural language) are wildly unstructured (and are subject to the ELIZA effect). But mostly, LLMs are auto-associative predictive generators with no understanding or reasoning going on inside. Should LLMs be trusted? Good question.

[inference:3:wrongness] LLMs have a propensity to be just plain wrong. Plan for that. (Using anthropomorphic terminology for error-making, such as the term “hallucinate” is not at all helpful.)

[output:2:wrongness] Prompt manipulation can lead to fallacious output (see [input:2:prompt injection]), but fallacious output can occur spontaneously as well. LLMs are notorious BS-ers that can make stuff up to justify their wrongness. If that output escapes into the world undetected, bad things can happen. If such output is later consumed by an LLM during training, recursive pollution is in effect.

Do you trust that black box foundation model you built your LLM application on? Why?

The More Things Change, the More They Stay The Same: Defending Against Vulnerabilities you Create

14 February 2024

gem

No comments

Categories: Uncategorized

Regarding the AP wire story out this morning (which features a quote by BIML):

Like any tool that humans have created, LLMs can be repurposed to do bad things. The biggest danger that LLMs pose in security is that they can leverage the ELIZA effect to convince gullible people into believing they are thinking and understanding things. This makes them particularly interesting in attacks that involve what security people call “spoofing.” Spoofing is important enough as an attack category that Microsoft included it in it’s STRIDE system as the very first attack to worry about. There is no doubt that LLMs make spoofing much more powerful as an attack. This includes creating and using “deep fakes” FWIW. Phishing attacks? Spoofing. Confidence flim-flams? Spoofing. Ransomware negotiations? Spoofing will help. Credit card fraud? Spoofing used all the time.

Twenty years ago the security community found it pretty brazen that Microsoft was thinking about selling defensive security tools at all since many of the attacks and exploits in the wild were successfully targeting their broken software. “Why don’t they just fix the broken software instead of monetizing their own bugs?” we asked. We might ask the same thing today. Why not create more secure black box LLM foundation models instead of selling defensive tools for a problem they are helping to create?

Absolute Nonsense from Anthropic: Sleeper Agents

8 February 2024

Harold

2 Comments

Categories: Uncategorized

And in the land where I grew up
Into the bosom of technology
I kept my feelings to myself
Until the perfect moment comes
-David Byrne

From its very title—Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training—you get the first glimpse of the anthropomorphic mish-mosh interpretation of LLM function that infects this study. Further on, any doubts about this deeply-misguided line of reasoning (and its detrimental effects on actual work in machine learning security) are resolved for the worst.

Caught in, or perhaps parroting, the FUD-fueled narrative of “the existential threat of AI,” the title evokes images of rogue AI leveraging deception to pursue its plans of world domination. Oh my! The authors do, however, state multiple times that the “work does not assess the likelihood of the discussed threat models.” In other words, it’s all misguided fantasy. The rogue AI, that they deem “deceptive instrumental alignment” is completely hypothetical; a second, more-real threat referred to in the literature as “model backdoors” or “Trojan models” is not new. The misleading and deceptive reasoning exhibited in this paper is, alas, all too human.

The rationale for this work is built on a problematic analogy with human deception, where “similar selection pressures” that lead humans to deceive will result in the same kind of bad behavior in “future AI systems.” No consideration is given to the limits of this analogy in terms of the agency and behavior of an actual living organism (intentional, socially accountable, living precariously, mortal), versus the pretend “agency” that may be simulated by a generative model through clever prompting with no effective embodiment. Squawk!

A series of before and after experiments on models with deliberately-hardwired backdoors are the most interesting part of the study, with perhaps-useful observations but deeply-flawed interpretations. The conceptual failure of anthropomorphism is central to the flawed interpretations. For example, a Chain-of-Thought prompting trick is interpreted as offering “reasoning tools” (quotes ours, not in the paper) and construed to reveal the model’s deceptive intentions. Sad to say, training the model to generate text about deceptive intention does nothing to create actual “deceptive intention,” all that has been done is to provide more (stilted) context that the model uses during generation. When the model is distilled to incorporate this new “deceptive training,” the text is sublimated from the text generation into model changes. Yes, this makes the associations in question harder to see in some lighting (like painted on camouflage), but it does not intent make.

Observations about the “robustness” of Chain-of-Thought prompting tricks is interpreted as “teach[ing] the models to better recognize [sic] their backdoor triggers, effectively hiding their unsafe behavior”. However, in our view, the reported observations would be better described in terms of fuzzy versus crisp triggering behavior in the backdoor model. When adversarial training mitigates the fuzzy triggering of the backdoor generative state, it is not hiding possible unsafe behavior! How we construe the goals and impact of the backdoor behavior will change how we should consider the outcome of the safety training. Fuzzy triggers increase the “recall” of the attack state, and perhaps increase the probability of detection of the poisoned model, but also increase unintended harm. Safety training in this case was observed to make triggers more precise, with all of the functional consequences that entails. If adversarial training had uncovered the actual trigger, it could have mitigated that as well.

We suggest an adaption to the subtitle Anthropic chose, that is, we prefer: Safety Training Ineffective Against Backdoor Model Attacks. It may not make great clickbait, but at least it brings attention to the more-substantial observation in the study: that current “behavioral safety training” mechanisms create a false impression of safety. In fact, we find them a Potemkin Village in the land of AI safety.