Skip Navigation
Search

Kevin Eykholt, PHD

Researcher
IBM Research

Kevin Eykholt earned his Ph.D in computer science at the University of Michigan Ann Arbor. During his time there, he designed one of the first physical adversarial attacks on computer vision classifiers and object detectors through the use of small adversarial stickers. These stickers, when placed on road signs, caused computer vision systems to mislabel or ignore objects it previously recognized. Now at IBM, Kevin continues studying adversarial machine learning as both an attacker and a defender. As an attacker, his interest is in the feasibility of current attack threat models. As a defender, he looks to create simple, scalable, and easily deployable techniques that improve the security and reliability of machine learning systems. Kevin believes that a complex technique isn't always the best technique, especially when trying to get others to use it.

ABSTRACT

Revisiting the adversarial machine learning threat landscape: Is it futile to protect against white-box attacks?

“Hope for the best, but plan for the worse.” A commonly heard phrase, that for security practitioners means they should design and evaluate their security measures to protect against the strongest possible adversary. This approach has generally worked well in security as the measures used to protect against a strong attack are also intrinsically assumed to work against weaker attacks.

Adversarial machine learning researchers design and evaluate their defenses in a similar fashion. Many “defenses” proposed in top security venues seek to defend against a strong attacker using the “white-box threat model”, a threat model in which an attacker has full knowledge of the model parameters, instead of a “black-box model”, where this information is obscured. Nonintuitively, however, these defenses usually fail against a black-box attacker as they don’t truly improve the model’s performance on adversarial samples, but rather attempt to obfuscate the discovery of adversarial examples.

In this talk, I argue that evaluating adversarial defenses against white-box attacks may not be the correct approach to securing machine learning systems and that we should instead focus on evaluating defenses against black-box attacks. First, I’ll revisit the white-box and black-box threat model and review the assumptions and requirements for each model. Then, I’ll discuss how each of these threat models might apply to real-world scenarios and examine fundamental problems that the community seems to ignore, before concluding with why black-box attacks are more useful for evaluation of adversarial defenses.