Use Your Community Resources [Principle 10]
Community resources can be a double-edged sword; on the one hand, systems that have faced public scrutiny can benefit from the collective effort to break them. But nefarious individuals aren’t interested in publicizing the flaws they identify in open systems, and even large communities of developers have trouble resolving all of the flaws in such systems. Relying on publicly available information can expose your own system to risks, particularly if an attacker is able to identify similarities between your system and public ones.
Transfer learning is a particularly relevant issue to ML systems. While transfer learning has demonstrated success in applying the learned knowledge of an ML system to other problems, knowledge of the base model can sometimes be used to attack the student [wang18]. In a more general sense, the use of publicly available models and hyperparameters could expose ML systems to particular attacks. How do engineers know that a model they use wasn’t deliberately made public for this very purpose?
Public datasets used to train ML algorithms are another important concern. Engineers need to take care to validate the authenticity and quality of any public datasets they use, especially when that data could have been manipulated by unknown parties. At the core of these concerns is the matter of trust; if the community can be trusted to effectively promote the security of their tools, models, and data, then community resources can be hesitantly used. Otherwise, it would be better to avoid exposing systems to unnecessary risk. After all, security problems in widely-used open-source projects have been known to persist for years, and in some cases decades, before the community finally took notice.