We are extremely pleased to announce that Katie McMahon has joined BIML as a permanent researcher.
Katie McMahon
Katie McMahon is a global entrepreneur and technology executive who has been at the leading edge of sound recognition and natural language understanding technologies for the past 20 years. As VP at Shazam, she brought the iconic music recognition app to market which went on to reach 2 billion installs and 70 billion queries (Acquired by Apple) and spent over a decade at Soun...
As the world is rapidly advancing technologically, it is vital to understand the implications and opportunities presented by Large Language Models (LLMs) in the realm of national security and beyond. This discussion will bring together leading experts from various disciplines to share insights on the risks, ethical considerations, and potential benefits of utilizing LLMs for intelligence, cybersecurity, and other applications.
Irius Risk, a company specializing in automating threat modeling for software security, hosted a webinar on Machine Learning and Threat Modeling March 30, 2023. BIML CEO Gary McGraw participated in the webinar along with Adam Shostack.
The webinar was recorded and you can watch here. FWIW, we are still not exactly clear on Adam’s date of replacement.
Every bunch of years, the National Science Foundation holds vision workshops to discuss scientific progress in fields they support. This year BIML’s Gary McGraw was pleased to keynote the Computer Science “Secure and Trustworthy Cyberspace” meeting.
He gave a talk on what #MLsec can learn from #swsec with a focus on technology discover, development, and commercialization. There are many parallels between the two fields. Now is a great time to be working in machine learning security...
Right. So not only is ML going to write your code, it is also going to hack it. LOL. I guess the thought leaders out there have collectively lost their minds.
Fortunately, Taylor Armerding has some sane things to say about all this. Read his article here.
Adam Shostack is one of the pre-eminent experts on threat modeling. So when he publishes an article, it is always worth reading and thinking about. But Adam seems to be either naïve or insanely optimistic when it comes to AI/ML progress. ML has no actual IDEA what it’s doing. Don’t ever forget that.
This issue is so important that we plan to debate it soon in a webinar format. Contact us for details.
As a software security guy, I am definitely in tune with the idea of automated coding. But today’s “code assistants” do not have any design-level understanding of code. Plus they copy (statistically-speaking, anyway) chunks of code full of bugs.
Robert Lemos wrote a very timely article on the matter. Check it out.
The second in a two part darkreading series focused on machine learning data exposure and data-related risk focuses attention on protecting training data without screwing it up. For the record, we believe that technical approaches like synthetic data creation and differential privacy definitely screw up your data, sometimes so much that the ML activity you wanted to accomplish is no longer feasible.
The talk posed a bit of a challenge since it was the very first “Thursday talk” delivered after COVID swept the planet. As you might imagine, seniors who are smart are very much wary of the pandemic. In the end, the live talk was delivered to around 12 people with an audience of about 90 on closed circuit TV. That,...
We’re pleased that BIML has helped spread the word about MLsec (that is, machine learning security engioneering) all over the world. We’ve given talks in Germany, Norway, England, and, of course, all over the United States.
An important part of our mission at BIML is to spread the word about machine learning security. We’re interested in compelling and informative discussions of the risks of AI that get past the scary sound bite or the sexy attack story. We’re proud to continue the bi-monthly video series we’re calling BIML in the Barn.
Our fourth video talk features Professor David Evans a computer scientist at University of Virginia working on Security Engineering for Machine Learning. David is interested ...
This version of the Security Engineering for Machine Learning talk is focused on computer scientists familiar with algorithms and basic machine learning concepts. It was delivered 2/24/22.
In an article published in February 2022, BIML CEO Gary McGraw discusses why ML practitioners need to consider ops data exposure in addition to worrying about training data. Have a read.
This is the first in a series of two articles focused on data privacy and ML. This one, the first, focuses on ops data exposure. The second discusses training data in more detail.
BIML co-founder and CEO Gary McGraw will deliver a public lecture at the Barns of Rose Hill on Friday July 1st. All proceeds benefit FISH of Clarke County.
An important part of our mission at BIML is to spread the word about machine learning security. We’re interested in compelling and informative discussions of the risks of AI that get past the scary sound bite or the sexy attack story. We’re proud to continue the bi-monthly video series we’re calling BIML in the Barn.
Our third video talk features Ram Shankar Siva Kumar a researcher at Microsoft Azure working on Adversarial Machine Learning. Of course, we prefer to call this Security Engi...
It turns out that operational data exposure swamps out all other kinds of data exposure and data security issues in ML, something that came as a surprise.
An important part of our mission at BIML is to spread the word about machine learning security. We’re interested in compelling and informative discussions of the risks of AI that get past the scary sound bite or the sexy attack story. We’re proud to introduce a bi-monthly video series we’re calling BIML in the Barn.
Our first video talk features Maritza Johnson, a professor at UC San Diego and an expert on human-centered security and privacy. As you’re about to see, Maritza combines re...
The (extremely) local paper in the county where Berryville is situated (rural Virginia) is distributed by mail. They also have a website, but that is an afterthought at best.
Fortunately, the Clarke Monthly is on the cutting edge of technology reporting. Here is an article featuring BIML and Security Engineering for Machine Learning.
I gave a talk this week at a meeting hosted by Microsoft and Mitre called the 6th Security Data Science Colloquium. It was an interesting bunch (about 150 people) including the usual suspects: Microsoft, Google, Facebook, a bunch of startups and universities, and of course BIML.
I decided to rant about nomenclature, with a focus on RISKS versus ATTACKS as a central tenet of how to approach ML security. Heck, even the term “Adversarial AI” gets it wrong in all the ways. For the record, ...
Another week, another talk in Indiana! This time Purdue’s CERIAS center was the target. Turns out I have given “one talk per decade” at Purdue, starting with a 2001 talk (then 2009). Here is the 2021 edition.
BIML founder Gary McGraw delivered the last talk of the semester for the Center for Applied Cybersecurity Research (CACR) speakers series at Indiana University. You can watch the talk on YouTube.
If your organization is interested in having a presentation by BIML, please contact us today.
As our MLsec work makes abundantly clear, data play a huge role in security of an ML system. Our estimation is that somewhere around 60% of all security risk in ML can be directly associated with data. And data are biased in ways that lead to serious social justice problems including racism, sexism, classism, and xenophobia. We’ve read a few ML bias papers (see the BIML Anotated Bibliography for our commentary). Turns out that social justice in ML is a thorny and difficult subject.
An important part of BIML’s mission as an institute is to spread the word about our understanding of machine learning security risk throughout the world. We recently decided to take on three college and high school interns to provide a bridge to academia and to inculcate young minds early in the intricacies of machine learning security. We introduce them here in a series of blog entries.
We are very pleased to introduce Aishwarya Seth who is a BIML University Scholar.
Berryville resident Gary McGraw is founder of the Berryville Institute of Machine Learning, which is a think tank. BIML’s small group of researchers tries to find ways to make technology safer so hackers cannot breach vital — or even secret — information. The institute has received a $150,000 grant from the Open Philanthropy foundation to help further its work.
An important part of BIML’s mission as an institute is to spread the word about our understanding of machine learning security risk throughout the world. We recently decided to take on three college and high school interns to provide a bridge to academia and to inculcate young minds early in the intricacies of machine learning security. We introduce them here in a series of blog entries.
We are very pleased to introduce Trinity Stroud who is a BIML University Scholar.
An important part of BIML’s mission as an institute is to spread the word about our understanding of machine learning security risk throughout the world. We recently decided to take on three college and high school interns to provide a bridge to academia and to inculcate young minds early in the intricacies of machine learning security. We introduce them here in a series of blog entries.
We are very pleased to introduce Nikil Shyamsunder who is the first BIML High School Scholar.
Berryville Institute of Machine Learning (BIML) Gets $150,000 Open Philanthropy Grant. Funding will advance ethical AI research
Online PR News – 27-January-2021 – BERRYVILLE, VA – The Berryville Institute of Machine Learning (BIML), a research think tank dedicated to safe, secure and ethical development of AI technologies, announced today that it is the recipient of a $150,000 grant from Open Philanthropy.
BIML, which is already well known in ML circles for its pioneering document, “Ar...
BERRYVILLE, Va., Feb. 13, 2020 – The Berryville Institute of Machine Learning (BIML), a research think tank dedicated to safe, secure and ethical development of AI technologies, today released the first-ever risk framework to guide development of secure ML. The “Architectural Risk Analysis of Machine Learning Systems: Toward More Secure Machine Learning” is designed for use by developers, engineers, designers and others who are creating applications and services that use ML technologies.
The first talk on BIML’s new Architectural Risk Analysis of Machine Learning Systems was delivered this Wednesday at Lord Fairfax Community College. The talk was well attended and included a remote audience attending virtually. The Winchester Star published a short article about the talk.
Berryville Institute of Machine Learning (BIML) is located in Clarke County, Virginia, an area served by Lord Fairfax Community College.
Recently there have been several documents published as
guides to security in machine learning. In October 2019, NIST published a draft
called “A
Taxonomy and Terminology of Adversarial Machine Learning”. Then in
November, Microsoft published several
interrelated webpages laying out a threat model for AI/ML systems and tying
it to MS’s existing Software Development Lifecycle. We took a look at these
documents to find out what they are trying to do, what they do well, and what
they lack.
Community resources
can be a double-edged sword; on the one hand, systems that have faced public
scrutiny can benefit from the collective effort to break them. But nefarious
individuals aren’t interested in publicizing the flaws they identify in open
systems, and even large communities of developers have trouble resolving all of
the flaws in such systems. Relying on publicly available information can expose
your own system to risks, particularly if an attacker is able to identify
similaritie...
ML systems rely on a number of possibly untrusted, external sources for both their data and their computation. Let’s take on data first. Mechanisms used to collect and process data for training and evaluation make an obvious target. Of course, ML engineers need to get their data somehow, and this necessarily invokes the question of trust. How does an ML system know it can trust the data it’s being fed? And, more generally, what can the system do to evaluate the collector’s trustworthiness? B...
Security
is often about keeping secrets. Users don’t want their personal data leaked.
Keys must be kept secret to avoid eavesdropping and tampering. Top-secret
algorithms need to be protected from competitors. These kinds of requirements
are almost always high on the list, but turn out to be far more difficult to meet
than the average user may suspect.
ML system engineers may want to keep the
intricacies of their system secret, including the algorithm and model used, hyperparameter
and co...
Privacy is tricky even when ML is not involved. ML makes things ever trickier by in some sense re-representing sensitive and/or confidential data inside of the machine. This makes the original data “invisible” (at least to some users), but remember that the data are still in some sense “in there somewhere.” So, for example, if you train a classifier on sensitive medical data and you don’t consider what will happen when an attacker tries to get those data back out through a set of sophistic...
Keep It Simple, Stupid (often spelled out KISS) is good advice when it comes to security. Complex software (including most ML software) is at much greater risk of being inadequately implemented or poorly designed than simple software is, causing serious security challenges. Keeping software simple is necessary to avoid problems related to efficiency, maintainability, and of course, security. But software is by its very nature complex.
Machine Learning seems to defy KISS by its very natu...
The figure above shows how we choose to represent a generic ML system. Note that in our generic model, both processes and collections are treated as components. Processes are represented by ovals, whereas artifacts and collections of artifacts are represented as rectangles.
The risk analysis of the generic ML system above uses a set of nine “components” to help categorize and explain risks found in various logical pieces. Components can be either processes or collections. Just as under...
The principle of least privilege states that only the minimum access necessary to perform an operation should be granted, and that access should be granted only for the minimum amount of time necessary.[i]
When you give out access to parts of a system, there is always some risk that the privileges associated with that access will be abused. For example, let’s say you are to go on vacation and you give a friend the key to your home, just to feed pets, collect mail, and so forth. Although y...
Even under ideal conditions, complex systems are bound to fail eventually. Failure is an unavoidable state that should always be planned for. From a security perspective, failure itself isn’t the problem so much as the tendency for many systems to exhibit insecure behavior when they fail.
The best real-world example we know is one that bridges the real world and the electronic world—credit card authentication. Big credit card companies such as Visa and MasterCard spend lots of money on au...
The idea behind defense in depth is to manage risk with diverse defensive strategies, so that if one layer of defense turns out to be inadequate, another layer of defense hopefully prevents a full breach.
Let’s go back to our example of bank security. Why is the typical bank more secure than the typical convenience store? Because there are many redundant security measures protecting the bank, and the more measures there are, the more secure the place is.
Security people are quick to point out that security is like a chain. And just as a chain is only as strong as the weakest link, an ML system is only as secure as its weakest component. Want to anticipate where bad guys will attack your ML system? Well, think through which part would be easiest to attack.
ML systems are different from many other artifacts that we engineer because the data in ML are just as important (or sometimes even more important) than the learning mechanism itself....
Early work in security and privacy of ML has taken an “operations security” tack focused on securing an existing ML system and maintaining its data integrity. For example, Nicolas Papernot uses Salzter and Schroeder’s famous security principles to provide an operational perspective on ML security1. In our view, this work does not go far enough into ML design to satisfy our goals. Following Papernot, we directly address Salzter and Schroeder’s security principles as adapted in the book Buildi...
The exceptionally tasteful BIML logo was designed by Jackie McGraw. The logo incorporates both a yin/yang concept (huh, wonder where that comes from?) and a glyph that incorporates a B, and M, and an L in a clever way.
Here is the glyph:
The BIML glyph
Here is my personal logo (seen all over, but most famously on the cover of Software Security:
Gary McGraw’s logo (as seen on the cover of Software Security among other places)
Here is the combined glyph plus yin/yang which ma...
Welcome to the BIML blog where we will (informally) write about MLsec, otherwise known as Machine Learning security. BIML is short for the Berryville Institute of Machine Learning. For what it’s worth, we think it is pretty amusing to have a “Berryville Institute” just like Santa Fe has the “Santa Fe Institute.” You go, Berryville!
BIML was born when I retired from my job of 24 years in January 2019. Many years ago as a graduate student at Indiana University, I did lots of work in...