Follow the Principle of Least Privilege [Principle 4]
The principle of least privilege states that only the minimum access necessary to perform an operation should be granted, and that access should be granted only for the minimum amount of time necessary.[i]
When you give out access to parts of a system, there is always some risk that the privileges associated with that access will be abused. For example, let’s say you are to go on vacation and you give a friend the key to your home, just to feed pets, collect mail, and so forth. Although you may trust the friend, there is always the possibility that there will be a party in your house without your consent, or that something else will happen that you don’t like. Regardless of whether you trust your friend, there’s really no need to put yourself at risk by giving more access than necessary. For example, if you don’t have pets, but only need a friend to pick up the mail on occasion, you should relinquish only the mailbox key. Although your friend may find a good way to abuse that privilege, at least you don’t have to worry about the possibility of additional abuse. If you give out the house key unnecessarily, all that changes.
Similarly, if you do get a house-sitter while you’re on vacation, you aren’t likely to let that person keep your keys when you’re not on vacation. If you do, you’re setting yourself up for additional risk. Whenever a key to your house is out of your control, there’s a risk of that key getting duplicated. If there’s a key outside your control, and you’re not home, then there’s the risk that the key is being used to enter your house. Any length of time that someone has your key and is not being supervised by you constitutes a window of time in which you are vulnerable to an attack. You want to keep such windows of vulnerability as short as possible to minimize your risks.
In an ML system, we most likely want to control access around lifecycle phases. In the training phase, the system may have access to lots of possibly sensitive training data. Assuming an offline model (where training is not continuous), after the training phase is complete, the system should no longer require access to those data. (As we discussed when we were talking defense in depth, system engineers need to understand that in some sense all of the confidential data are now represented in the trained-up ML system and may be subject to ML-specific attacks.)
Thinking about access control in an ML is useful and can be applied through the lens of the principle of least privilege, particularly between lifecycle phases and system components. Users of an ML system are not likely to need access to training data and test data, so don’t give it to them. In fact, users may only require black box API access to a running system. If that’s the case, then provide only what is necessary in order to preserve security.
Less is more when it comes to the principle of least privilege. Limit data exposure to those components that require it and then for as short a time period as possible.
[i]Saltzer, Jerome H., and Michael D. Schroeder. “The protection of information in computer systems.” Proceedings of the IEEE63, no. 9 (1975): 1278-1308.