Skip to main content

针对patch的防御算法

简介

Existing defending methodologies can be classified into two categories, ==heuristic defenses== , and ==certified defenses==. 针对adversarial patch 的防御算法主要可以分为两大类:1. 启发式防御(heuristic defenses)2. certified defenses(认证防御)

heuristic defenses (启发式防御)

具有很高的速度,但是对于adaptive attack缺乏鲁棒性

相关工作

DW [1]

On visible adversarial perturbations & digital watermarking. (2018 CVPR) 这篇论文中将adversarial patch叫做图像中的 ==watermarking==

这篇论文将对抗性防御看做一个==图像修复问题==,文章中将图像修复问题分为两类:

  1. Non-blind. : ==give the location of the areas to be inpainted== along with the corrupted image
  2. blind.: the reconstruction process is given only the noisy image. ==The area to be inpainted must be discovered== before inpainted can begin.

对于Non-blind,文章使用了“==An image inpainting technique based on the fast marching method==.”[2] 没有细看,放在大致上的思路应该就是让这个区域的像素和周围区域的像素值距离更小

对于blind:

***核心思想***:==利用反向传播方法构造图像的显著性映射==, 使用使用输出类相对于输入图像的梯度来构造显著映射。 (zeroing the influence signal if either the forward or backward pass through a ReLU unit is negative.) ,在确认了对抗性贴片的位置之后就可以将贴片覆盖掉;

LGS: Local Gradients Smoothing [3]

LGS的全称是local gradients smoothing局部梯度平滑,具体做的方法就是首先估计出来噪声所在的位置然后对噪声位置上的梯度进行正则化处理

这篇论文提到了对于Deep fool [4]和adversarial patch [5] 这样的攻击方式,他们都在高频区域==引入了集中在特定图像边缘的高频噪声==(但是实际上对于adversarial patch这样的观点并不正确,过多的高频噪声反而会抑制adversarial patch的攻击性)

然后在这篇论文中提出了: ==the effect of adversarial can be reduced significantly by suppressing high-frequency regions without affecting the area of the low-frequency image that are important for the classification==

为了实现这样的效果,文章中直接使用了: ==projecting a scaled normalized gradient magnitude map onto the image to directly suppress high activation regions.==

What is the scaled normalized gradient magnitude map? 简单来说就是对图像进行求梯度(Sobel, Prewitt)然后根据梯度算子进行归一化,比如sobel它的梯度算子是[1 0 -1; 2 0 -2; 1 0 -1] 那么梯度归一化之后的结果就是:Gmag / 8

小结

==heuristic defenses==虽然这个名词比较高大上,但是说白了就是通过修改图片的方式来去除==adversarial noise==的攻击性,比如对于adversarial patch来说,常见的思路就是先通过梯度或者其他的方式来确定patch的位置,然后通过直接覆盖或者其他方法来破坏patch的攻击性。

这种防御算法的缺点就是对于攻击性更强的==白盒攻击==可能会失去有效性;

certified defenses (可证明式防御)

对于certified defenses,算法可以判断出一个样本是否属于对抗性样本

相关工作

Certified defenses for adversarial patch [6]

代码:https://arxiv.org/abs/2003.06693.

这篇论文主要提出了一种针对==adversarial patch==的防御算法,也是==certified defence==第一次应用于防御==adversarial patch==。

论文的一开始就提到了像DW[1], LGS[3]之类的防御算法都很容易被强大的对抗性样本攻破,这篇文章就提出来了一个更加具有鲁棒性的防御算法。 论文也提到了,这篇论文是第一篇使用==interval bound propagation (IBP)== [7] 来防御对抗性贴片的==certified defenses==.

论文中有一节去专门证明了前面的==heuristic defenses==类型的防御算法对于贴片攻击的防御效果不太行.

IBP [7]

Interval bound propagation (IBP, 区间有界传播), which is derived from interval arithmetic: an incomplete method for training verifiably robust classifiers.>

代码: https://github.com/deepmind/interval-bound-propagation.

优缺点

计算速度快,具有很高的可扩展性. 虽然这种方法在某些任务上明显优于基于线性松弛的方法,但是由于边界要宽松得多,因此存在稳定性问题.

DS[9]

BagNet[10]

A provable robustness defense-based on clipped BagNet [10] with a small reception field is proposed by [11]

References

[1] Jamie Hayes. On visible adversarial perturbations & digital watermarking. In2018 IEEE Conference onComputer Vision and Pattern Recognition Workshops (CVPR Workshops), pages 1597–1604, 2018 [2] A.Telea.Animageinpaintingtechniquebasedonthefastmarching method. Journal of graphics tools, 9(1):23–34, 2004. 3 [3] Muzammal Naseer, Salman Khan, and Fatih Porikli. Local gradients smoothing: Defense against localizedadversarial attacks. InIEEE Winter Conference on Applications of Computer Vision (WACV), pages1300–1307, 2019.11 [4] T.B.Brown, D.Mane ́, A.Roy, M.Abadi, and Gilmer.Adversarial patch. In Neural Information Processing Systems (NIPS), 2017. [5] S. M. Moosavi Dezfooli, A. Fawzi, and P. Frossard. Deep- fool: a simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number EPFL-CONF-218057, 2016. [6] Ping-Yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Christoph Studor, and Tom Goldstein.Certified defenses for adversarial patches. In8th International Conference on Learning Representations(ICLR), 2020. [7] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato,Relja Arandjelovic, Timothy Arthur Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. pages 4841–4850, 2019. [8] Alexander Levine and Soheil Feizi. (De)randomized smoothing for certifiable defense against patch attacks.InConference on Neural Information Processing Systems, (NeurIPS), 2020. [9] Alexander Levine and Soheil Feizi. (De)randomized smoothing for certifiable defense against patch attacks.InConference on Neural Information Processing Systems, (NeurIPS), 2020. [10] Wieland Brendel and Matthias Bethge. Approximating CNNs with bag-of-local-features models workssurprisingly well on ImageNet. In7th International Conference on Learning Representations (ICLR), 2019. [11] Zhanyuan Zhang, Benson Yuan, Michael McCoyd, and David Wagner. Clipped bagnet: Defending againststicker attacks with clipped bag-of-features. In3rd Deep Learning and Security Workshop (DLS), 2020.