对抗式机器学习

对抗式机器学习（Adversarial machine learning）是针对机器学习算法的攻击，以及针对这类攻击的防范^[1]。2020年的一个问卷统计，实作机器学习的人认为需要针对工业应用的机器学习有进阶防护^[2]。

机器学习技术大部分是设计来解决特定问题，其假设是训练资料和测试资料是由相同统计分布下的资料所产生的（独立同分布，IID）。不过，在一些高风险的应用上，可能会违背上述的假设，使用者刻意的提供违背上述统计假设的假资料。

对抗式机器学习中常见的攻击包括规避攻击（evasion attack）^[3]、资料下毒攻击（data poisoning attack）^[4]、拜占庭攻击^[5]以及模型析取（model extraction）^[6]。

历史

John Graham-Cumming（英语：John Graham-Cumming）曾在2004年1月于MIT Spam Conference中指出，机器学习的垃圾邮件过滤器可以欺骗其他机器学习垃圾邮件过滤器，将垃圾邮件分类为正常邮件，其作法是用自动学习的方式，在垃圾邮件中加入一些字^[7]。

2004年时，Nilesh Dalvi等人发现垃圾邮件过滤器中使用的线性分类器可以用简单的规避（英语：Evasion (network security)）攻击来进行欺骗，作法是在垃圾邮件中加入“好的文字”（2007年时，有其他垃圾邮件发送者在在其image spam的华丽词句中加入噪声，以欺骗用光学字符识别来侦测的过滤器）。Marco Barreno等人在2006年发表《Can Machine Learning Be Secure?》，概述对于机器学习攻击的广泛分类。许多研究者希望非线性分类器（例如支持向量机和人工神经网络）对于这类攻击有较好的抵抗能力，一直到2013年Battista Biggio等人提出了第一个对于这类机器学习模型进行的，以梯度为基础的攻击（gradient-based attack）（2012年^[8]–2013年^[9]）。2012年时，深度学习是电脑视觉处理的主流解法，从2014年开始，Christian Szegedy等人也指出深度学习网络也可能被欺骗，仍然是用以梯度为基础的攻击来产生对抗性扰动^[10]^[11]。

近来发现，在现实世界的对抗式攻击较不容易产生，因为不同的环境限制会抵消对抗性噪声的效果^[12]^[13]。例如，对抗性影像微小的转动或是轻微光照都可能会破坏其对抗效果。此外，像Google Brain的Nick Frosst（英语：Nick Frosst）等研究者指出：要让自驾车^[14]错过停车标志，直接移除标志，会比产生对抗性范例要简单^[15]。Frosst认为对抗性机器学习的群体误以为在特定资料分布下训练的模型，在完全不同的资料分布下也会有相同的表现。他建议探索新的机器学习方式，有一种独特的类神经网络正在开发，其特性比其他现有方式更接近人类的感知^[15]。

虽然对抗式机器学习仍主要是从学术界开始，但像是Google、Microsoft、IBM等公司已开始制作文件以及开源的程式码基底，让其他人可以具体评估他们所使用模型的稳健性，并且降低对抗式攻击的风险^[16]^[17]^[18]。

例子

对抗式机器学习的例子包括攻击反垃圾邮件系统，利用在“坏”文字中插入“好”文字来混淆系统，让垃圾邮件可通过^[19]^[20]，在计算机安全上的攻击，像是在让系统混淆，不过滤有恶意程式的网络封包，或是改变网络串流（英语：Traffic flow (computer networking)）的特征，以误导入侵侦测系统^[21]^[22]；生物特征识别上的攻击，用伪造的生物特征来冒用有权限的使用者通过侦测^[23]，或是损害使用者样本库，使其无法随着时间更新使用者特征。

研究者指出只更改一个画素就可以欺骗深度学习算法^[24]。在2017年曾有人用3D打印制作一只玩具乌龟，上面的纹理会让Google的物体侦测AI不论在任何角度观看，都会认为其为步枪^[25]。制作此乌龟只需要低成本的商业3D打印技术^[26]。

有机器修改过的狗图片，不论是电脑或是人都会认为是猫^[27]。有一篇2019年的研究指出人可以猜出AI如何分类出恶意影像^[28]。研究者发现一些方法，在停车标志的外观上有一些扰动，就可以让自驾车系统将其分类成车道汇入标志或是停车标志^[14]^[29]。

Nightshade是资料下毒的过滤器，在2023年由芝加哥大学的研究者所提出。这是让艺术家放在其作品上，以此污染文本转图像生成模型的资料集，这些厂商常使用他们在网络上的资料，没有经过图片创建者的同意^[30]^[31]。

迈克菲曾攻击Tesla曾使用的Mobileye系统，使其在比速限高50mph的速度行驶，方法是在速限告示牌上贴了二吋的黑胶带^[32]^[33]。

有人设计放在衣服上的对抗式图案，目的是要欺骗面部识别系统，因此衍生了一个“隐形街头服饰”这个小众产业^[34]。

有一个针对类神经网络的对抗式攻击，可以在目标系统中注入算法^[35]。研究者也可以创建对抗的声音输入，加在看似无害的声音中，以此对智慧助理发出伪装指令^[36]，另一项平行研究探讨了人类对这类刺激的认知^[37]^[38]。

聚类算法也有用于安全应用中。其中进行恶意软件和计算机病毒分析的目的，是要识别恶意软件家族，并且产生特定的侦测程式区块（signatures）^[39]^[40]。

在恶意软件侦测上，研究者也提出了可以对抗式恶意软件生成方式，可以自动化产生二进制文件，规避以学习为基础的侦测器，同时仍保意恶意功能。像GAMMA之类，以最佳化为基础的攻击，用基因算法将良性内容（例如填充或是新的程式可执行段）加到Windows可执行档中，将规避变成有限制的最佳化问题，平衡让误导分类的成功率，以及注入载荷的大小，并且证明这可以转移到商品化的防毒软件中^[41]。其他的研究有使用生成对抗网络（GAN）来学习特征空间的扰动，设法让恶意软件侦测程式误判为正常软件，例如Mal-LSGAN，用最小二乘目标和修改后的激活函数代替了标准的GAN损失，提升训练稳定性，并且产生对抗性恶意范例，可以在多种侦测器中稳定的降低真正被判定为恶意软件的几率^[42]。

将机器学习应用在安全性上的挑战

研究者发现机器学习技术应用在安全领域的挑战，和在其他主流应用领域的不同。安全资料会随时间而演变，其中包括误分类的标本以及反映的恶意行为，这让评估和可复制性都变的复杂^[43]。

资料分类问题

安全相关的资料集的资料格式各有不同，包括二进制档、网络轨迹（network traces）和纪录档。有研究指出将这些来源转换为等特征的过程中会引入偏差或是不一致性^[43]。此外，若训练用和测试用的恶意程式样本之间没有适当的隔离，会出现以时间为基础的泄漏，这可能导致过于乐观的结果^[43]。

标签标注和真实标签挑战

恶意软件的标签是不稳定的，不同的防毒引擎会对同一个样本有多种互相矛盾的分类。Ceschin等人注意到恶意标签家族可能会随着间更名或是重整，这会导致真实标签（ground truth）出现更多差异，并降低基准测试的可靠性^[43]。

观念漂移（Concept drift）

因为恶意软件产生器的技术会随时代而演进，恶意软件样本的统计特性也会随时间而变化。这种概念漂移现象已被广泛记录，除非系统定期更新或采用增量学习机制，否则可能会降低模型效能^[43]。

特征强健性

研究者发现哪些特征容易被篡改，哪些特征不容易篡改。例如，简单的静态属性（像是表格上方的栏位）容易被攻击者修改，而结构化的特征（例如控制流图）则不容易修改，但要提取特征需要的计算成本也很高^[43]。

分类不平衡

在现实的布署环境中，恶意样本的比例相当的少，约在总资料的0.01%到2%之间。不平衡的资料分布会让模型倾向主要的分类，可以达到高精度，但无法识别恶意的样本^[44]。

此问题的前期研究包括资料层次的模以及序列特定模型。像是N元语法和长短期记忆（LSTM）网络等模型可以为序列资料建模，但其若恶意样本在训练集合中的比例和现实类似时，已证明其性能会缓慢衰退，因此此模型在实务的安全应用上有其限制^[44]。

为了处理此问题，一个作法是从自然语言处理来调整模型（例如BERT）。此方法将程式活动的顺序也视为一种语言，并且为此特殊任务特调一个预训练的BERT模型。有一个针对此技术在Android activity sequences上应用的报告，在只有0.5%恶意样本的资料集中，得到了F1 score 0.919。此结果比LSTM和N元语法有大幅提升，展示用预训练模型来处理恶意软件侦测时的分类不平衡问题的潜力^[44]。

攻击模式

分类

针对（受监督）机器学习算法的攻击，可以由三个主要的轴向分类^[45]：对分类器的影响、安全违反（security violation）、以及其特定性。

对分类器的影响：攻击者可以用破坏分类阶段来影响分类器。在此之前，攻击者可能会有探索阶段识别弱点。若是有资料处理上的限制，可以让攻击者的能力受限^[46]。
安全违反（Security violation）：攻击可能会提供会判定为正常的恶意资料。在训练中提供的恶意资料会让系统训练之后，拒绝其实正常的资料。
特定性：针对特定目标的攻击试图实现特定的入侵/破坏。相反，无差别攻击会造成广泛的混乱。

这种分类方法已扩展为更全面的威胁模型，可以对攻击者的目标、对被攻击系统的了解、操纵输入资料/系统组件的能力以及攻击策略做出明确的假设^[47]^[48]。此分类法进一步扩展，纳入了对抗性攻击的防御策略维度^[49]。

策略

以下是最常遇到的攻击情境。

资料下毒

资料下毒（Data poisoning）是指用规划好的资料来污染训练资料集，其目的是要增加输出的错误。假定学习算法会由训练集所产生，资料下毒可以有效的改写算法，加入一些潜在有恶意的内容。尤其对于使用者产生的训练数据，例如用于内容推荐或自然语言模型的数据，目前已有人提出了疑虑。假账号的无所不在也提供了许多下毒的机会。据报道，Facebook每年删除约70亿个假账号^[50]^[51]。在机器学习的工业应用上，资料下毒已是最受关注的议题^[2]。

在社群媒体上，有人试着利用造谣来操纵推荐和审核算法，以推崇某些内容，压制其他内容^{[需要更深入解释]}。

有一种特别的资料下毒应用，是后门攻击^[52]，目的是想要教授在某特定输入（例如图片、声音、影像或是文字）下，触发特定的行为。

例如，入侵检测系统常常会用搜集到的资料加以训练。攻击者可能在这些资料中下毒，在运行中注入恶意样本，以便干扰后续的再训练^[47]^[48]^[45]^[54]^[55]。

资料下毒技术也可以用在文本转图像生成模型中，以影响其输出，艺术家们用此来保护其版权作品或艺术风格免受模仿^[56]。

资料下毒也可能因为模型崩溃（英语：model collapse）而无意产生，模型崩溃是指模型用其他AI合成的资料进行训练^[57]。

拜占庭攻击

随着机器学习规模的扩大，常会需要配合多台电脑或处理器运作。例如，在联邦学习中，边缘设备和中央服务器协同处理，一般是送出模型参数的梯度资料。不过，其中有些设备可能会不依照其预期行为运作，例如危害中央服务器的模型^[58]，或是让算法偏向某特定的行为（例如强化对假资讯的推荐）。另一方面，某训练是由某一单一电脑进行，此模型很容易因为此电脑失效（或受攻击）而失效，此系统有单点故障的特性，这是不希望出现的^[59]。事实上，甚至是机器的所有者也可能在其中加入可能无法侦测的软件后门^[60]。

让分散式机器学习算法可以不受少数恶意参与者影响（拜占庭将军问题）的技术中，目前最领先的技术是以强健梯度聚合原则（robust gradient aggregation rules）为基础^[61]^[62]^[63]^[64]^[65]^[66]。强健聚合原则不一定都能运作，特别是参与者的资料不是独立同分布的资料。然而，在异质诚实参与者的背景下，例如推荐算法中具有不同消费习惯的用户，或语言模型中具有不同写作风格的用户，对于任何稳健学习算法所能保证的，都存在可证明的不可能定理^[5]^[67]。

参考资料

^ Kianpour, Mazaher; Wen, Shao-Fang. Timing Attacks on Machine Learning: State of the Art. Intelligent Systems and Applications. Advances in Intelligent Systems and Computing 1037. 2020: 111–125. ISBN 978-3-030-29515-8. S2CID 201705926. doi:10.1007/978-3-030-29516-5_10 （英语）.
^ ^2.0 ^2.1 Siva Kumar, Ram Shankar; Nyström, Magnus; Lambert, John; Marshall, Andrew; Goertzel, Mario; Comissoneru, Andi; Swann, Matt; Xia, Sharon. Adversarial Machine Learning-Industry Perspectives. 2020 IEEE Security and Privacy Workshops (SPW). May 2020: 69–75. ISBN 978-1-7281-9346-5. S2CID 229357721. doi:10.1109/SPW50608.2020.00028.
^ Goodfellow, Ian; McDaniel, Patrick; Papernot, Nicolas. Making machine learning robust against adversarial inputs. Communications of the ACM. 25 June 2018, 61 (7): 56–66. ISSN 0001-0782. doi:10.1145/3134599  （英语）. ^{[永久失效链接]}
^ Geiping, Jonas; Fowl, Liam H.; Huang, W. Ronny; Czaja, Wojciech; Taylor, Gavin; Moeller, Michael; Goldstein, Tom. Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching. International Conference on Learning Representations 2021 (Poster). 2020-09-28 （英语）.
^ ^5.0 ^5.1 El-Mhamdi, El Mahdi; Farhadkhani, Sadegh; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien. Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning). Advances in Neural Information Processing Systems. 2021-12-06, 34. arXiv:2008.00742  （英语）.
^ Tramèr, Florian; Zhang, Fan; Juels, Ari; Reiter, Michael K.; Ristenpart, Thomas. Stealing Machine Learning Models via Prediction {APIs}. 25th USENIX Security Symposium: 601–618. 2016. ISBN 978-1-931971-32-4 （英语）.
^ How to beat an adaptive/Bayesian spam filter (2004). [2023-07-05] （英语）.
^ Biggio, Battista; Nelson, Blaine; Laskov, Pavel. Poisoning Attacks against Support Vector Machines. 2013-03-25. arXiv:1206.6389  [cs.LG].
^ Biggio, Battista; Corona, Igino; Maiorca, Davide; Nelson, Blaine; Srndic, Nedim; Laskov, Pavel; Giacinto, Giorgio; Roli, Fabio. Evasion Attacks against Machine Learning at Test Time. Advanced Information Systems Engineering. Lecture Notes in Computer Science 7908. Springer. 2013: 387–402. ISBN 978-3-642-38708-1. S2CID 18716873. arXiv:1708.06131 . doi:10.1007/978-3-642-40994-3_25.
^ Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob. Intriguing properties of neural networks. 2014-02-19. arXiv:1312.6199  [cs.CV].
^ Biggio, Battista; Roli, Fabio. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition. December 2018, 84: 317–331. Bibcode:2018PatRe..84..317B. S2CID 207324435. arXiv:1712.03141 . doi:10.1016/j.patcog.2018.07.023.
^ Kurakin, Alexey; Goodfellow, Ian; Bengio, Samy. Adversarial examples in the physical world. 2016. arXiv:1607.02533  [cs.CV].
^ Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Techniques." 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020.
^ ^14.0 ^14.1 Lim, Hazel Si Min; Taeihagh, Araz. Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities. Sustainability. 2019, 11 (20): 5791. Bibcode:2019arXiv191013122L. S2CID 204951009. arXiv:1910.13122 . doi:10.3390/su11205791  （英语）.
^ ^15.0 ^15.1 Google Brain's Nicholas Frosst on Adversarial Examples and Emotional Responses. Synced. 2019-11-21 [2021-10-23].
^ Responsible AI practices. Google AI. [2021-10-23] （英语）.
^ Adversarial Robustness Toolbox (ART) v1.8, Trusted-AI, 2021-10-23 [2021-10-23]
^ amarshal. Failure Modes in Machine Learning - Security documentation. docs.microsoft.com. [2021-10-23] （美国英语）.
^ Biggio, Battista; Fumera, Giorgio; Roli, Fabio. Multiple classifier systems for robust classifier design in adversarial environments. International Journal of Machine Learning and Cybernetics. 2010, 1 (1–4): 27–41 [2015-01-14]. ISSN 1868-8071. S2CID 8729381. doi:10.1007/s13042-010-0007-7. hdl:11567/1087824. （原始内容存档于2023-01-19）.
^ Brückner, Michael; Kanzow, Christian; Scheffer, Tobias. Static Prediction Games for Adversarial Learning Problems (PDF). Journal of Machine Learning Research. 2012, 13 (Sep): 2617–2654. ISSN 1533-7928.
^ Apruzzese, Giovanni; Andreolini, Mauro; Ferretti, Luca; Marchetti, Mirco; Colajanni, Michele. Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems. Digital Threats: Research and Practice. 2021-06-03, 3 (3): 1–19. ISSN 2692-1626. S2CID 235458519. arXiv:2106.09380 . doi:10.1145/3469659.
^ Vitorino, João; Oliveira, Nuno; Praça, Isabel. Adaptative Perturbation Patterns: Realistic Adversarial Learning for Robust Intrusion Detection. Future Internet. March 2022, 14 (4): 108. ISSN 1999-5903. arXiv:2203.04234 . doi:10.3390/fi14040108 . hdl:10400.22/21851  （英语）.
^ Rodrigues, Ricardo N.; Ling, Lee Luan; Govindaraju, Venu. Robustness of multimodal biometric fusion methods against spoof attacks (PDF). Journal of Visual Languages & Computing. 1 June 2009, 20 (3): 169–179. ISSN 1045-926X. doi:10.1016/j.jvlc.2009.01.010.
^ Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi. One Pixel Attack for Fooling Deep Neural Networks. IEEE Transactions on Evolutionary Computation. October 2019, 23 (5): 828–841. Bibcode:2019ITEC...23..828S. ISSN 1941-0026. S2CID 2698863. arXiv:1710.08864 . doi:10.1109/TEVC.2019.2890858.
^ Single pixel change fools AI programs. BBC News. 3 November 2017 [12 February 2018].
^ Athalye, Anish; Engstrom, Logan; Ilyas, Andrew; Kwok, Kevin. Synthesizing Robust Adversarial Examples. 2017. arXiv:1707.07397  [cs.CV].
^ AI Has a Hallucination Problem That's Proving Tough to Fix. WIRED. 2018 [10 March 2018].
^ Zhou, Zhenglong; Firestone, Chaz. Humans can decipher adversarial images. Nature Communications. 2019, 10 (1): 1334. Bibcode:2019NatCo..10.1334Z. PMC 6430776 . PMID 30902973. arXiv:1809.04120 . doi:10.1038/s41467-019-08931-6 .
^ Ackerman, Evan. Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms. IEEE Spectrum: Technology, Engineering, and Science News. 2017-08-04 [2019-07-15].
^ Edwards, Benj. University of Chicago researchers seek to "poison" AI art generators with Nightshade. Ars Technica. 2023-10-25 [2025-06-25] （英语）.
^ Shan, Shawn; Ding, Wenxin; Passananti, Josephine; Wu, Stanley; Zheng, Haitao; Zhao, Ben Y. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. 2023. arXiv:2310.13828  [cs.CR].
^ A Tiny Piece of Tape Tricked Teslas Into Speeding Up 50 MPH. Wired. 2020 [11 March 2020] （英语）.
^ Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles. McAfee Blogs. 2020-02-19 [2020-03-11] （美国英语）.
^ Seabrook, John. Dressing for the Surveillance Age. The New Yorker. 2020 [5 April 2020] （英语）.
^ Heaven, Douglas. Why deep-learning AIs are so easy to fool. Nature. October 2019, 574 (7777): 163–166. Bibcode:2019Natur.574..163H. PMID 31597977. S2CID 203928744. doi:10.1038/d41586-019-03013-5 （英语）.
^ Hutson, Matthew. AI can now defend itself against malicious messages hidden in speech. Nature. 10 May 2019. PMID 32385365. S2CID 189666088. doi:10.1038/d41586-019-01510-1.
^ Lepori, Michael A; Firestone, Chaz. Can you hear me now? Sensitive comparisons of human and machine perception. 2020-03-27. arXiv:2003.12362  [eess.AS].
^ Vadillo, Jon; Santana, Roberto. On the human evaluation of audio adversarial examples. 2020-01-23. arXiv:2001.08444  [eess.AS].
^ D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
^ B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges 互联网档案馆的存档，存档日期2022-05-20.". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
^ Demetrio, L.; Biggio, B.; Lagorio, G.; Roli, F.; Armando, A. "Functionality-Preserving Black-Box Optimization of Adversarial Windows Malware." IEEE Transactions on Information Forensics and Security. 2021.
^ Wang, J.; Chang, X.; Mišić, J.; Mišić, V. B.; Wang, Y.; Zhang, J. "Mal-LSGAN: An Effective Adversarial Malware Example Generation Model." In: Proceedings of IEEE GLOBECOM 2021.
^ ^43.0 ^43.1 ^43.2 ^43.3 ^43.4 ^43.5 Ceschin, Fabrício; Botacin, Marcus; Bifet, Albert; Pfahringer, Bernhard; Oliveira, Luiz S.; Gomes, Heitor Murilo; Grégio, André. Machine Learning (In) Security: A Stream of Problems. Digital Threats: Research and Practice. 2023, 1 (1). arXiv:2010.16045 . doi:10.1145/3617897.
^ ^44.0 ^44.1 ^44.2 Oak, Rajvardhan; Du, Min; Yan, David; Takawale, Harshvardhan; Amit, Idan. Malware Detection on Highly Imbalanced Data through Sequence Modeling. Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. ACM. 2019-11-11: 37–48. ISBN 978-1-4503-6833-9. doi:10.1145/3338501.3357374 （英语）.
^ ^45.0 ^45.1 Barreno, Marco; Nelson, Blaine; Joseph, Anthony D.; Tygar, J. D. The security of machine learning (PDF). Machine Learning. 2010, 81 (2): 121–148. Bibcode:2010MLear..81..121B. S2CID 2304759. doi:10.1007/s10994-010-5188-5 .
^ Sikos, Leslie F. AI in Cybersecurity. Intelligent Systems Reference Library 151. Cham: Springer. 2019: 50. ISBN 978-3-319-98841-2. S2CID 259216663. doi:10.1007/978-3-319-98842-9.
^ ^47.0 ^47.1 B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack 互联网档案馆的存档，存档日期2018-05-18.". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
^ ^48.0 ^48.1 Biggio, Battista; Corona, Igino; Nelson, Blaine; Rubinstein, Benjamin I. P.; Maiorca, Davide; Fumera, Giorgio; Giacinto, Giorgio; Roli, Fabio. Security Evaluation of Support Vector Machines in Adversarial Environments. Support Vector Machines Applications. Springer International Publishing. 2014: 105–153. ISBN 978-3-319-02300-7. S2CID 18666561. arXiv:1401.7727 . doi:10.1007/978-3-319-02300-7_4.
^ Heinrich, Kai; Graf, Johannes; Chen, Ji; Laurisch, Jakob; Zschech, Patrick. Fool Me Once, Shame On You, Fool Me Twice, Shame On Me: A Taxonomy of Attack and De-fense Patterns for AI Security. ECIS 2020 Research Papers. 2020-06-15.
^ Facebook removes 15 Billion fake accounts in two years. Tech Digest. 2021-09-27 [2022-06-08] （英国英语）.
^ Facebook removed 3 billion fake accounts in just 6 months. New York Post. Associated Press. 2019-05-23 [2022-06-08] （美国英语）.
^ Schwarzschild, Avi; Goldblum, Micah; Gupta, Arjun; Dickerson, John P.; Goldstein, Tom. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks. International Conference on Machine Learning (PMLR). 2021-07-01: 9389–9398 （英语）.
^ Shan, Shawn; Ding, Wenxin; Passananti, Josephine; Wu, Stanley; Zheng, Haitao; Zhao, Ben Y. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. 2023. arXiv:2310.13828  [cs.CR].
^ B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise 互联网档案馆的存档，存档日期2020-08-03.". In Journal of Machine Learning Research – Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
^ M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
^ Edwards, Benj. University of Chicago researchers seek to "poison" AI art generators with Nightshade. Ars Technica. 2023-10-25 [2023-10-27] （美国英语）.
^ Rao, Rahul. AI-Generated Data Can Poison Future AI Models. Scientific American. [2024-06-22] （英语）.
^ Baruch, Gilad; Baruch, Moran; Goldberg, Yoav. A Little Is Enough: Circumventing Defenses For Distributed Learning. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2019, 32. arXiv:1902.06156 .
^ El-Mhamdi, El-Mahdi; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien. Genuinely distributed Byzantine machine learning. Distributed Computing. 2022-05-26, 35 (4): 305–331. ISSN 1432-0452. S2CID 249111966. arXiv:1905.03853 . doi:10.1007/s00446-022-00427-9 .
^ Goldwasser, S.; Kim, Michael P.; Vaikuntanathan, V.; Zamir, Or. Planting Undetectable Backdoors in Machine Learning Models. 2022. arXiv:2204.06974  [cs.LG].
^ Blanchard, Peva; El Mhamdi, El Mahdi; Guerraoui, Rachid; Stainer, Julien. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2017, 30.
^ Chen, Lingjiao; Wang, Hongyi; Charles, Zachary; Papailiopoulos, Dimitris. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients. International Conference on Machine Learning (PMLR). 2018-07-03: 903–912. arXiv:1803.09877  （英语）.
^ Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien. The Hidden Vulnerability of Distributed Learning in Byzantium. International Conference on Machine Learning (PMLR). 2018-07-03: 3521–3530. arXiv:1802.07927 .
^ Allen-Zhu, Zeyuan; Ebrahimianghazani, Faeze; Li, Jerry; Alistarh, Dan. Byzantine-Resilient Non-Convex Stochastic Gradient Descent. 2020-09-28. arXiv:2012.14368  [cs.LG]. Review
^ Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien. Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent. 9th International Conference on Learning Representations (ICLR), May 4–8, 2021 (virtual conference). 2020-09-28 [2022-10-20]. Review
^ Data, Deepesh; Diggavi, Suhas. Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data. International Conference on Machine Learning (PMLR). 2021-07-01: 2478–2488.
^ Karimireddy, Sai Praneeth; He, Lie; Jaggi, Martin. Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. 2021-09-29. arXiv:2006.09365  [cs.LG]. Review

外部链接

MITRE ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems
NIST 8269 Draft: A Taxonomy and Terminology of Adversarial Machine Learning
NIPS 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security
AlfaSVMLib 互联网档案馆的存档，存档日期2020-09-24. – Adversarial Label Flip Attacks against Support Vector Machines
Laskov, Pavel; Lippmann, Richard. Machine learning in adversarial environments. Machine Learning. 2010, 81 (2): 115–119. S2CID 12567278. doi:10.1007/s10994-010-5207-6.
Dagstuhl Perspectives Workshop on "Machine Learning Methods for Computer Security"
Workshop on Artificial Intelligence and Security, (AISec) Series

[1] Kianpour, Mazaher; Wen, Shao-Fang. Timing Attacks on Machine Learning: State of the Art. Intelligent Systems and Applications. Advances in Intelligent Systems and Computing 1037. 2020: 111–125. ISBN 978-3-030-29515-8. S2CID 201705926. doi:10.1007/978-3-030-29516-5_10 （英语）.

[:1-2] 2.0 ^2.1 Siva Kumar, Ram Shankar; Nyström, Magnus; Lambert, John; Marshall, Andrew; Goertzel, Mario; Comissoneru, Andi; Swann, Matt; Xia, Sharon. Adversarial Machine Learning-Industry Perspectives. 2020 IEEE Security and Privacy Workshops (SPW). May 2020: 69–75. ISBN 978-1-7281-9346-5. S2CID 229357721. doi:10.1109/SPW50608.2020.00028.

[GoodfellowMcDaniel20182-3] Goodfellow, Ian; McDaniel, Patrick; Papernot, Nicolas. Making machine learning robust against adversarial inputs. Communications of the ACM. 25 June 2018, 61 (7): 56–66. ISSN 0001-0782. doi:10.1145/3134599  （英语）. ^{[永久失效链接]}

[4] Geiping, Jonas; Fowl, Liam H.; Huang, W. Ronny; Czaja, Wojciech; Taylor, Gavin; Moeller, Michael; Goldstein, Tom. Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching. International Conference on Learning Representations 2021 (Poster). 2020-09-28 （英语）.

[:13-5] 5.0 ^5.1 El-Mhamdi, El Mahdi; Farhadkhani, Sadegh; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien. Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning). Advances in Neural Information Processing Systems. 2021-12-06, 34. arXiv:2008.00742  （英语）.

[6] Tramèr, Florian; Zhang, Fan; Juels, Ari; Reiter, Michael K.; Ristenpart, Thomas. Stealing Machine Learning Models via Prediction {APIs}. 25th USENIX Security Symposium: 601–618. 2016. ISBN 978-1-931971-32-4 （英语）.

[7] How to beat an adaptive/Bayesian spam filter (2004). [2023-07-05] （英语）.

[Poisoning_Attacks_against_Support_V-8] Biggio, Battista; Nelson, Blaine; Laskov, Pavel. Poisoning Attacks against Support Vector Machines. 2013-03-25. arXiv:1206.6389  [cs.LG].

[Springer-9] Biggio, Battista; Corona, Igino; Maiorca, Davide; Nelson, Blaine; Srndic, Nedim; Laskov, Pavel; Giacinto, Giorgio; Roli, Fabio. Evasion Attacks against Machine Learning at Test Time. Advanced Information Systems Engineering. Lecture Notes in Computer Science 7908. Springer. 2013: 387–402. ISBN 978-3-642-38708-1. S2CID 18716873. arXiv:1708.06131 . doi:10.1007/978-3-642-40994-3_25.

[10] Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob. Intriguing properties of neural networks. 2014-02-19. arXiv:1312.6199  [cs.CV].

[:02-11] Biggio, Battista; Roli, Fabio. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition. December 2018, 84: 317–331. Bibcode:2018PatRe..84..317B. S2CID 207324435. arXiv:1712.03141 . doi:10.1016/j.patcog.2018.07.023.

[12] Kurakin, Alexey; Goodfellow, Ian; Bengio, Samy. Adversarial examples in the physical world. 2016. arXiv:1607.02533  [cs.CV].

[13] Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Techniques." 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020.

[LimTaeihagh20192-14] 14.0 ^14.1 Lim, Hazel Si Min; Taeihagh, Araz. Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities. Sustainability. 2019, 11 (20): 5791. Bibcode:2019arXiv191013122L. S2CID 204951009. arXiv:1910.13122 . doi:10.3390/su11205791  （英语）.

[:2-15] 15.0 ^15.1 Google Brain's Nicholas Frosst on Adversarial Examples and Emotional Responses. Synced. 2019-11-21 [2021-10-23].

[16] Responsible AI practices. Google AI. [2021-10-23] （英语）.

[:3-17] Adversarial Robustness Toolbox (ART) v1.8, Trusted-AI, 2021-10-23 [2021-10-23]

[18] rshal. Failure Modes in Machine Learning - Security documentation. docs.microsoft.com. [2021-10-23] （美国英语）.

[BiggioFumera20102-19] Biggio, Battista; Fumera, Giorgio; Roli, Fabio. Multiple classifier systems for robust classifier design in adversarial environments. International Journal of Machine Learning and Cybernetics. 2010, 1 (1–4): 27–41 [2015-01-14]. ISSN 1868-8071. S2CID 8729381. doi:10.1007/s13042-010-0007-7. hdl:11567/1087824. （原始内容存档于2023-01-19）.

[Adversarial_Machine_Learning_18A2-20] Brückner, Michael; Kanzow, Christian; Scheffer, Tobias. Static Prediction Games for Adversarial Learning Problems (PDF). Journal of Machine Learning Research. 2012, 13 (Sep): 2617–2654. ISSN 1533-7928.

[21] Apruzzese, Giovanni; Andreolini, Mauro; Ferretti, Luca; Marchetti, Mirco; Colajanni, Michele. Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems. Digital Threats: Research and Practice. 2021-06-03, 3 (3): 1–19. ISSN 2692-1626. S2CID 235458519. arXiv:2106.09380 . doi:10.1145/3469659.

[:0-22] Vitorino, João; Oliveira, Nuno; Praça, Isabel. Adaptative Perturbation Patterns: Realistic Adversarial Learning for Robust Intrusion Detection. Future Internet. March 2022, 14 (4): 108. ISSN 1999-5903. arXiv:2203.04234 . doi:10.3390/fi14040108 . hdl:10400.22/21851  （英语）.

[RodriguesLing20092-23] Rodrigues, Ricardo N.; Ling, Lee Luan; Govindaraju, Venu. Robustness of multimodal biometric fusion methods against spoof attacks (PDF). Journal of Visual Languages & Computing. 1 June 2009, 20 (3): 169–179. ISSN 1045-926X. doi:10.1016/j.jvlc.2009.01.010.

[24] Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi. One Pixel Attack for Fooling Deep Neural Networks. IEEE Transactions on Evolutionary Computation. October 2019, 23 (5): 828–841. Bibcode:2019ITEC...23..828S. ISSN 1941-0026. S2CID 2698863. arXiv:1710.08864 . doi:10.1109/TEVC.2019.2890858.

[25] Single pixel change fools AI programs. BBC News. 3 November 2017 [12 February 2018].

[26] Athalye, Anish; Engstrom, Logan; Ilyas, Andrew; Kwok, Kevin. Synthesizing Robust Adversarial Examples. 2017. arXiv:1707.07397  [cs.CV].

[27] AI Has a Hallucination Problem That's Proving Tough to Fix. WIRED. 2018 [10 March 2018].

[28] Zhou, Zhenglong; Firestone, Chaz. Humans can decipher adversarial images. Nature Communications. 2019, 10 (1): 1334. Bibcode:2019NatCo..10.1334Z. PMC 6430776 . PMID 30902973. arXiv:1809.04120 . doi:10.1038/s41467-019-08931-6 .

[29] Ackerman, Evan. Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms. IEEE Spectrum: Technology, Engineering, and Science News. 2017-08-04 [2019-07-15].

[30] Edwards, Benj. University of Chicago researchers seek to "poison" AI art generators with Nightshade. Ars Technica. 2023-10-25 [2025-06-25] （英语）.

[31] Shan, Shawn; Ding, Wenxin; Passananti, Josephine; Wu, Stanley; Zheng, Haitao; Zhao, Ben Y. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. 2023. arXiv:2310.13828  [cs.CR].

[32] A Tiny Piece of Tape Tricked Teslas Into Speeding Up 50 MPH. Wired. 2020 [11 March 2020] （英语）.

[33] Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles. McAfee Blogs. 2020-02-19 [2020-03-11] （美国英语）.

[34] Seabrook, John. Dressing for the Surveillance Age. The New Yorker. 2020 [5 April 2020] （英语）.

[nature_why2-35] Heaven, Douglas. Why deep-learning AIs are so easy to fool. Nature. October 2019, 574 (7777): 163–166. Bibcode:2019Natur.574..163H. PMID 31597977. S2CID 203928744. doi:10.1038/d41586-019-03013-5 （英语）.

[36] Hutson, Matthew. AI can now defend itself against malicious messages hidden in speech. Nature. 10 May 2019. PMID 32385365. S2CID 189666088. doi:10.1038/d41586-019-01510-1.

[37] Lepori, Michael A; Firestone, Chaz. Can you hear me now? Sensitive comparisons of human and machine perception. 2020-03-27. arXiv:2003.12362  [eess.AS].

[38] Vadillo, Jon; Santana, Roberto. On the human evaluation of audio adversarial examples. 2020-01-23. arXiv:2001.08444  [eess.AS].

[Adversarial_Machine_Learning_42A2-39] D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.

[Adversarial_Machine_Learning_46A2-40] B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges 互联网档案馆的存档，存档日期2022-05-20.". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.

[41] Demetrio, L.; Biggio, B.; Lagorio, G.; Roli, F.; Armando, A. "Functionality-Preserving Black-Box Optimization of Adversarial Windows Malware." IEEE Transactions on Information Forensics and Security. 2021.

[42] Wang, J.; Chang, X.; Mišić, J.; Mišić, V. B.; Wang, Y.; Zhang, J. "Mal-LSGAN: An Effective Adversarial Malware Example Generation Model." In: Proceedings of IEEE GLOBECOM 2021.

[Ceschin2023-43] 43.0 ^43.1 ^43.2 ^43.3 ^43.4 ^43.5 Ceschin, Fabrício; Botacin, Marcus; Bifet, Albert; Pfahringer, Bernhard; Oliveira, Luiz S.; Gomes, Heitor Murilo; Grégio, André. Machine Learning (In) Security: A Stream of Problems. Digital Threats: Research and Practice. 2023, 1 (1). arXiv:2010.16045 . doi:10.1145/3617897.

[:16-44] 44.0 ^44.1 ^44.2 Oak, Rajvardhan; Du, Min; Yan, David; Takawale, Harshvardhan; Amit, Idan. Malware Detection on Highly Imbalanced Data through Sequence Modeling. Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. ACM. 2019-11-11: 37–48. ISBN 978-1-4503-6833-9. doi:10.1145/3338501.3357374 （英语）.

[Adversarial_Machine_Learning_22-45] 45.0 ^45.1 Barreno, Marco; Nelson, Blaine; Joseph, Anthony D.; Tygar, J. D. The security of machine learning (PDF). Machine Learning. 2010, 81 (2): 121–148. Bibcode:2010MLear..81..121B. S2CID 2304759. doi:10.1007/s10994-010-5188-5 .

[46] Sikos, Leslie F. AI in Cybersecurity. Intelligent Systems Reference Library 151. Cham: Springer. 2019: 50. ISBN 978-3-319-98841-2. S2CID 259216663. doi:10.1007/978-3-319-98842-9.

[Adversarial_Machine_Learning_4A2-47] 47.0 ^47.1 B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack 互联网档案馆的存档，存档日期2018-05-18.". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.

[Adversarial_Machine_Learning_5A2-48] 48.0 ^48.1 Biggio, Battista; Corona, Igino; Nelson, Blaine; Rubinstein, Benjamin I. P.; Maiorca, Davide; Fumera, Giorgio; Giacinto, Giorgio; Roli, Fabio. Security Evaluation of Support Vector Machines in Adversarial Environments. Support Vector Machines Applications. Springer International Publishing. 2014: 105–153. ISBN 978-3-319-02300-7. S2CID 18666561. arXiv:1401.7727 . doi:10.1007/978-3-319-02300-7_4.

[49] Heinrich, Kai; Graf, Johannes; Chen, Ji; Laurisch, Jakob; Zschech, Patrick. Fool Me Once, Shame On You, Fool Me Twice, Shame On Me: A Taxonomy of Attack and De-fense Patterns for AI Security. ECIS 2020 Research Papers. 2020-06-15.

[50] Facebook removes 15 Billion fake accounts in two years. Tech Digest. 2021-09-27 [2022-06-08] （英国英语）.

[51] Facebook removed 3 billion fake accounts in just 6 months. New York Post. Associated Press. 2019-05-23 [2022-06-08] （美国英语）.

[52] Schwarzschild, Avi; Goldblum, Micah; Gupta, Arjun; Dickerson, John P.; Goldstein, Tom. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks. International Conference on Machine Learning (PMLR). 2021-07-01: 9389–9398 （英语）.

[53] Shan, Shawn; Ding, Wenxin; Passananti, Josephine; Wu, Stanley; Zheng, Haitao; Zhao, Ben Y. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. 2023. arXiv:2310.13828  [cs.CR].

[Adversarial_Machine_Learning_15A2-54] B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise 互联网档案馆的存档，存档日期2020-08-03.". In Journal of Machine Learning Research – Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.

[Adversarial_Machine_Learning_29A2-55] M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.

[56] Edwards, Benj. University of Chicago researchers seek to "poison" AI art generators with Nightshade. Ars Technica. 2023-10-25 [2023-10-27] （美国英语）.

[57] Rao, Rahul. AI-Generated Data Can Poison Future AI Models. Scientific American. [2024-06-22] （英语）.

[Baruch_2019-58] Baruch, Gilad; Baruch, Moran; Goldberg, Yoav. A Little Is Enough: Circumventing Defenses For Distributed Learning. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2019, 32. arXiv:1902.06156 .

[59] El-Mhamdi, El-Mahdi; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien. Genuinely distributed Byzantine machine learning. Distributed Computing. 2022-05-26, 35 (4): 305–331. ISSN 1432-0452. S2CID 249111966. arXiv:1905.03853 . doi:10.1007/s00446-022-00427-9 .

[60] Goldwasser, S.; Kim, Michael P.; Vaikuntanathan, V.; Zamir, Or. Planting Undetectable Backdoors in Machine Learning Models. 2022. arXiv:2204.06974  [cs.LG].

[:14-61] Blanchard, Peva; El Mhamdi, El Mahdi; Guerraoui, Rachid; Stainer, Julien. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2017, 30.

[62] Chen, Lingjiao; Wang, Hongyi; Charles, Zachary; Papailiopoulos, Dimitris. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients. International Conference on Machine Learning (PMLR). 2018-07-03: 903–912. arXiv:1803.09877  （英语）.

[63] Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien. The Hidden Vulnerability of Distributed Learning in Byzantium. International Conference on Machine Learning (PMLR). 2018-07-03: 3521–3530. arXiv:1802.07927 .

[64] Allen-Zhu, Zeyuan; Ebrahimianghazani, Faeze; Li, Jerry; Alistarh, Dan. Byzantine-Resilient Non-Convex Stochastic Gradient Descent. 2020-09-28. arXiv:2012.14368  [cs.LG]. Review

[65] Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien. Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent. 9th International Conference on Learning Representations (ICLR), May 4–8, 2021 (virtual conference). 2020-09-28 [2022-10-20]. Review

[66] Data, Deepesh; Diggavi, Suhas. Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data. International Conference on Machine Learning (PMLR). 2021-07-01: 2478–2488.

[67] Karimireddy, Sai Praneeth; He, Lie; Jaggi, Martin. Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. 2021-09-29. arXiv:2006.09365  [cs.LG]. Review

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

历史

例子