IMPACT seminar – Lecture: Stanislav Fort (Google DeepMind) – „Adversarial attacks as a baby version of A(G)I alignment“


Datum / čas
Date(s) - 21.01.
11:00 - 12:00


Dear colleagues,


It is our pleasure to invite you to the following seminar organized by the IMPACT group and the Applied Algebra and Geometry (AAG) group:


„Adversarial attacks as a baby version of A(G)I alignment“ by Stanislav Fort (Google DeepMind)


When: Tuesday 2025-01-21 at 11:00

Where: CIIRC Room B-670 (building B, floor 6)



Adversarial attacks pose a significant challenge to the robustness, reliability and alignment of deep neural networks from simple computer vision models to hundred-billion-parameter language models. Despite their ubiquitous nature, our theoretical understanding of their character and ultimate causes, as well as our ability to successfully defend against them are noticeably lacking. This talk examines the robustness of modern deep learning methods and the surprising scaling of attacks on them, and showcases several practical examples of transferable attacks on the largest closed-source vision-language models out there. Building on biological insights and new empirical evidence, I will introduce our solution proposed in [1], in which we make a step towards the alignment of the implicit human and the explicit machine vision representations, closely connecting interpretability and robustness. I will conclude with a direct analogy between the problem of adversarial examples and the much larger task of general AI alignment.


[1] Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness. Stanislav Fort, Balaji Lakshminarayanan


For more information on IMPACT seminars, please visit


Kind regards,


Josef Sivic and Tomas Pajdla,


AAG ( )

Czech Institute of Robotics, Informatics and Cybernetics ( )