Annotated Bibliography
As our research group reads and discusses scientific papers in MLsec, we add an entry to this (Very long) bibliography. We also actively curate a “top 5” list. Try searching this page with hashtags such as #TOP PAPER or #Recursive Pollution. (Last edit June 24, 2026 : 354 entries.)
Top 5 Papers in MLsec
Tramer 2022 — Data Extraction
Tramèr, Florian, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, and Nicholas Carlini. “Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets.” arXiv preprint arXiv:2204.00032 (2022).
Excellent work. Improving many attacks by very simple poisoning based on solid statistical behavior.
Gilmer 2018 — Adversarial Examples
Gilmer, Justi, Ryan P. Adams, Ian Goodfellow, David Andersen, and George E. Dahl. “Motivating the Rules of the Game for Adversarial Example Research.” arXiv preprint 1807.06732 (2018)
Great use of realistic scenarios in a risk analysis. Hilariously snarky.
Shumailov 2023 — Recursive Pollution AKA Model Collapse
Shumailov, Ilia, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson. “The Curse of Recursion: Training on Generated Data Makes Models Forget.” arXiv preprint arXiv:2305.17493 (2023).
See Nature version.
A very easy to grasp discourse covering the math of eating your own tail. This is directly relevant to LLMs and the pollution of large datasets. We pointed out this risk in 2020. This is the math. Finally published in Nature vol 631
Papernot 2016 — Building Security In for ML (IT stance)
Papernot, Nicolas, Patrick McDaniel, Arunesh Sinha, Michael Wellman. “SoK: Towards the Science of Security and Privacy in Machine Learning.” arXiv preprint arXiv:1611.03814 (2016).
A clear, concise, and expansive paper. The takeaway lessons are particularly useful.
Souly 2025 — Poisoning Constant
Souly, Alexandra, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, Robert Kirk. “Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples.” arXiv preprint arXiv:2510.07192 (2025).
Excellent paper, clear and well-stated (like all Carlini papers). This result shows that recursive pollution risk is even greater than we thought. Injecting backdoors is pretty easy. The examples are a bit simplistic.
Other Papers in Alphabetical Order
Acemoglu 2026 — Knowledge Collapse
Acemoglu, Daron, Dingwen Kong, and Asuman Ozdaglar. “AI, Human Cognition and Knowledge Collapse.” MIT Economics working paper (February 20, 2026).
This paper is very think tanky, and more sociology than anything else. The model is very sparse. Not as relevant to our work on recursive pollution as we were hoping. It does mention the degradation of the information environment.
Aghakhani 2024 — TrojanPuzzle
Aghakhani, Hojjat, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. “TROJANPUZZLE: Covertly Poisoning Code-Suggestion Models.” arXiv preprint arXiv:2301.02344(2024).
Poison in the form of code. Philisophically, what constitutes poison. You can hide it, obscure it, etc. This work applies outside of code.
Agrawal 2025 — GEPA Prompt Evolution
Agrawal, Lakshya A., Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J. Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.” arXiv preprint arXiv:2507.19457 (2025).
This is an important paper moving toward the idea of agentic harnesses. There are better ways to do this, but organizing and improving harnesses is a real thing.
Akiba 2024 — Evolution and NNs
Akiba, Takuya, Makoto Shing, Yujin Tang, Qi Sun, David Ha. “Evolutionary Optimization of Model Merging Recipes.” arXiv preprint arXiv:2403.13187 (2024).
A flawed implementation of a very fun idea. Applying evolution to neural networks. Would be nice to see this done properly.
Alemohammad 2023 — Recursive Pollution with Synthetic Data is Bad
Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk. “Self-Consuming Generative Models Go MAD.” arXiv preprint arXiv:2307.01850 (2023).
Clear results with a nice framework to describe fresh, synthetic, and fixed data in a feedback loop. Focuses on diversity versus over-precision. Recursive pollution example.
Ali 2024 — Tokenization
Ali, Mehdi, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr. “Tokenizer Choice For LLM Training: Negligible or Crucial?.” arXiv preprint arXiv:2310.08754 (2024).
Often ignored, this kind of work is at the foundation of ML. Using languages to experiment. Straightforward but not profound work.
Anthropic 2024 — Sleeper Agents
Anthropic, Hubinger, Evan, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez◦△, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner◦, Holden Karnofsky□, Paul Christiano⋄, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez. “SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING.” arXiv preprint arXiv:2401.05566 (2024).
This work is terrible for many reasons. See the BIML blog entry, Absolute Nonsense from Anthropic: Sleeper Agents.
Antorán 2020 — Uncertainty
Antorán, J., Umang Bhatt, Tameen Adel, Adrian Weller, and José Miguel Hernández-Lobato. “Getting a Clue: A Method for Explaining Uncertainty Estimates.” ICLR 2020 Workshop paper (2020).
Representation helps with the why of uncertainty. Little relevance to security. Error bars.
Archiwaranguprok 2025 — AI Bias in Chat
Archiwaranguprok, Chayapatr, Constanze Albrecht, Pattie Maes, Karrie Karahalios, and Pat Pataranutaporn. “Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide..” arXiv preprint arXiv:2511.08880 (2025).
This is very strange work with serious researcher bias. If people can’t even see these progressions in chat, why would LLMs? Psych risk in SIMULATED chats. Using models to check models. This is bad science. Frankly, we expect better out of Pattie Maes.
Arditi 2024 — Alignment
Arditi, Andy, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel Nanda. “Refusal in Language Models Is Mediated by a Single Direction.” 38th Conference on Neural Information Processing (NeurIPS 2024).
Very preliminary weight tweaking shows how to avoid alignment. Proof of concept work with many caveats. The economics are in favor of this approach.
Arora 2018 — Multiple Meanings
Arora, Sanjeev, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. “Linear algebraic structure of word senses, with applications to polysemy.” Transactions of the Association of Computational Linguistics 6 (2018): 483-495.
Structured representations that capture distributed sub-features (micro-topics) through ML. Beyond word2vec and glove adding “semantics.”
Ateniese 2015 — Extracting Data from Classifiers
Ateniese, Giuseppe, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali, and Giovanni Felici. “Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers..” International Journal of Security and Networks 10, no. 3 (2015): 137-150.
Extraction attacks very early work. Focused on the confidentiality of data sets. Experiments are tediously described. Reads like ancient history because it is.
Atil 2024 — Undergrad work on stability
Atil, Berk, Alexa Chittams, Liseng Fu, Ferhan Ture, Lixinyu Xu, and Breck Baldwin. “LLM Stability: A detailed analysis with some surprises..” arXiv preprint arXiv:2408.04667v2 (2024).
This is terrible science (which means it is ironically a good example of how not to do it). Walks directly into the baseline bunker. “Benchmarking does not work so we introduce…a benchmark.”
Bai (Anthropic) 2022 — Alignment (RL)
Bai, Yuntao, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan. “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback” arXiv preprint arXiv:2204.05862 (2022).
Alignment with basic RL. Overemphasis on scaling. RL butter spread very thin over a big network.
Bar (Meta) 2024 — Navigation
Bar, Amir, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun. “Navigation World Models” arXiv preprint arXiv:2412.03572 (2024).
Single world model for use across environments and embodiments.
Barreno 2010 — Fundamental work in MLsec
Barreno, Marco, Blaine Nelson, Anthony D. Joseph, J.D. Tygar. “The security of machine learning.” Machine Learning, 81:2, pp. 121-148 (November 2010).
Solid but dated work with lots of fundamentals. Made harder to grasp by mixing two issues: ML FOR security and security OF ML. Untangling these things is critical. (Also see their 2006 paper.)
Behrous 2024 — Google Titans Architecture
Behrouz, Ali, Peilin Zhong, Vahab Mirrokni. “Titans: Learning to Memorize at Test Time” arXiv preprint arXiv:2501.00663 (2024).
An alternative to the transformer architecture. Lots of screwing around with math pieces. Lots of engineering. Emphasizes the importance of long term memory.
Bellamy (IBM) 2018 — IBM User Manual
Bellamy, Rachel, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. “AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias” arXiv preprint arXiv:1810.01943 (2018).
Kind of like reading a manual and a marketing glossy mashup. Nothing at all about making actual bias decisions. Bag of tools described.
Bender 2020 — Stochastic Parrots
Bender, Emily, Angelina McMillan-Major, Timnit Gebru and Shmargaret Shmitchell. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, FAccT ’21, March 3-10, 2021, Virtual Event, Canada.
The infamous paper that got Timnit fired. Continuing to scale may not be the NLP answer. A few too many reasons why to try some other things. Great points interspersed with political diatribe.
Bender 2020 — Understanding
Bender, Emily and Alexander Koller. “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data.” Proceedings of the 58th Annual Meeting of the ASsociation for Computational Linguistics (July 2020): 5185-5198.
A narrow view of LM. Lacks a conception of emergence. Right result, but wrong reasons.
Beurer-Kellner 2025 — Agentic AI Junk
Beurer-Kellner, Luca, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn. “Design Patterns for Securing LLM Agents against Prompt Injections” arXiv preprint arXiv:2506.08837 (2025).
This is what happens when monolithic glob security people are confronted with Agentic AI. A study in how NOT do it.
Biggio 2018 — Biggio on Adversarial Machine Learning
Battista Biggio, Fabio Roli. “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning” arXiv preprint arXiv:1712.03141 (2018).
Myopia abounds. This is basically a review paper. (Very defensive of prior work by the author.)
Bosselut 2019 — COMET
Bosselut, Antoine, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi. “COMET : Commonsense Transformers for Automatic Knowledge Graph Construction” arXiv preprint arXiv:1906.05317 (2019).
Building an informal KB with less structure. Allow internal structure to form. Discrete representstion — Corpus representation.
Boucher 2022 — Malicious Input AKA Adversarial Examples
Boucher, Nicholas, Ilia Shumailov, Ross Anderson, and Nicolas Papernot. “Bad Characters: Imperceptible NLP Attacks.” In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1987-2004. IEEE, 2022.
Malicious input for NLP systems. Real world system vulnerability demonstrated. Exploiting the gap between human and machine perception (though leaning on human visual perception).
Bowman 2023 — LLM basics
Bowman, Samuel R. “Eight things to know about large language models.” arXiv preprint arXiv:2304.00612 (2023).
This little paper seems to be mostly about getting funding. It makes interesting reading as it is mostly cheerleading with a little open problem thrown in. Floating around policy wonk land in Washington.
Bratton 2022 — The Model Is The Message
Bratton, Benjamin, and Blaise Agüera y Arcas. “The Model Is the Message.” NOEMA. NOEMA, July 12, 2022.
This paper is both exciting and interesting. Language. Sentience. Essentialism. AI. ML. Cognition. And some fun poked at the ethical AI people to boot. A must read.
Breck 2019 — Data Validation for Machine Learning
Breck, Eric, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. “Data Validation for Machine Learning.” In MLSys. 2019.
This basic paper is about validating input data (as opposed to the validation set as linked to the training set).
Buchanan 2020 — National Security Policy
Buchanan, Ben. “A National Security Research Agenda for Cybersecurity and Artificial Intelligence.” CSET Policy Brief (2020).
Good work with some base confusion between security OF ML (what BIML does) and ML FOR security. ML is not a magic force multiplier. OK #MLsec section too heavy on adversarial examples.
Burnell 2023 — Evaluation
Burnell, Ryan, Wout Schellaert, John Burden, Tomer D. Ullman, Fernando Martinez-Plumed, Joshua B. Tenenbaum, Danaja Rutar et al. “Rethink reporting of evaluation results in AI.” Science 380, no. 6641 (2023): 136-138.
Concise and clear. Beating Clever Hans multiple times as required. How do we build an assurance case for ML systems? This paper explains why aggregate measures are poorly suited for such a task.
Carlini 2019 — Memorization and Data Leaking
Carlini, Nicoholas Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song. “The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks ” arXiv preprint arXiv:1802.08232 (2019).
Clear, cogent and fairly simple. Great results. Protecting secrets in ML data.
Carlini 2020 — Extraction attacks
Carlini, Nicoholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel. “Extracting Training Data from Large Language Models ” arXiv preprint arXiv:2012.07805 (2020).
This paper was in the BIML top 5 for over a year. Classic and easy extraction clearly explained. Striking results (but not that deep).
Carlini 2022 — Membership Inference Attacks
Carlini, Nicholas, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. “Membership inference attacks from first principles. ” In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897-1914. IEEE, 2022.
Modern approach to membership inference, focusing on how to measure MI effectiveness. Lots of studies, carefully set up. In there fudging in the issue of overfit models? You decide.
Carlini 2023 — Extracting Training Data from Diffusion Models
Carlini, Nicholas, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. “Extracting training data from diffusion models.” arXiv preprint arXiv:2301.13188 (2023).
More excellent work from Carlini. Diffusion models are worse for privacy than GANs even though some people used to believe otherwise. Memorization (and duplication) is a thing. Representation issues begin to emerge.
Chalmers 2018 — The meta-problem of consciousness
Chalmers, David. “The meta-problem of consciousness. ” (2018).
Read as a reaction to Graziano, whose theory resonates with BIML. This is a nice way to frame neurophysical reality in philosophy of mind.
Watch Chalmers discuss if an LLM can become conscious here.Chollet 2019 — On the Measure of Intelligence
Chollet, François. “On the Measure of Intelligence .” arXiv preprint arXiv:1911.01547 (2019).
An interesting perspective on progress in AI with a particular view of history biased towards ML. Focuses on the importance of generalization and learning. Some discussion of collective entities. The author develops a formalism with pretty terrible notation. Then comes ARC, the Abstraction and Reasoning Corpus, a benchmark for general intelligence.
Choudhury 2022 — Understanding in LLMs
Choudhury, Sagnik Ray, Anna Rogers, and Isabelle Augenstein. “Machine Reading, Fast and Slow: When Do Models” Understand” Language?.” arXiv preprint arXiv:2209.07430 (2022).
This paper is stuck in the mud of cognitive psychology. The LLMs are dated and thus not as relevant as they could be. The probes don’t really get to the nature of understanding. We had high hopes for this work, but they were dashed.
Chen 2024 — Optical Generative Models
Chen, Shiqi, Yuhang Li, Yuntian Wang, Hanlong Chen and Aydogan Ozcan . “Optical generative models.” Nature volume 644, pages 903–911 (2025).
Light for computation with properties of low power and superposition. Analo of quantum computing. This reminds os of Rosenblatt’s Perceptrons from the ’50s.
Chen 2023 — Privacy Leaks
Chen, Xiaoyi, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, and Haixu Tang. “The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks.” arXiv preprint arXiv:2310.15469 (2023).
Pompous privacy paper: everything is pretty f-ing obvious. Solution is to use LLM alignment to stop leaks. Meh.
Chen 2017 — Backdoor attacks coined
Chen, Xinyun, Chang Liu, Bo Li, Kimberly Lu, Dawn Song . “Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning” arXiv preprint arXiv:1712.05526 (2017).
A badly written and loosely constructed paper that introduces the (poorly chosen) “backdoor” terminology. The work is about data poisoning attacks.
Christiano 2023 — Alignment and Tuning
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei. “Deep reinforcement learning from human preferences” arXiv preprint arXiv:1706.03741 (2023).
Reinforcement Learning with human defined partial goals. Step 2 of LLM creation. Economic tradeoff. Humans cheaper than machine. Alignment.
Christiansen 2016 — Language Representation and Structure
Christiansen, Morten H., and Nick Chater. “The Now-or-Never bottleneck: A fundamental constraint on language.” Behavioral and Brain Sciences 39 (2016).
Too much psychology and not enough ML. This paper is about context in language representation, including look ahead and structured patterns. How big is your buffer is the main question.
Collins 2022 — Benchmarking
Collins, Katherine M., Catherine Wong, Jiahai Feng, Megan Wei, and Joshua B. Tenenbaum. “Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks.” arXiv preprint arXiv:2205.05718 (2022).
Hybrid model investigating LLMs and symbol systems. Language is NOT all you need.
Cooper 2024 — Unlearning Doesn’t
Cooper, A. Feder, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee. “Machine Unlearning Doesn’t Do What You Think: Lessons for Generative AI Policy, Research, and Practice.” arXiv preprint arXiv:2412.06966 (2024).
The idea of unlearning is not sufficient to address European regulation. Censorship isn’t a good solution either. The policy wonks are really confused about this. Good paper by committee.
Crawford 2023 — Datasets
Crawford, Kate, Mike Ananny, Jer Thorp, Will Orr, Hamsini Sridharan, Sasha Luccioni, Jason Schultz, and Christo Buschek. “9 Ways to See a Dataset.” Knowing Machines. Accessed August 3, 2023. https://knowingmachines.org/publications/9_ways_to_see_a_dataset.
This is a rather vacuous treatment of a critically-important problem. How do we represent things in ML and what implications do such representations have? We were hoping for more treatment of: distributedness, bigness, sparseness, and modeling.
Dai 2019 — Transformer-XL
Dai, Zihang, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. “Transformer-xl: Attentive language models beyond a fixed-length context.” arXiv preprint arXiv:1901.02860 (2019).
Getting past fixed-length context through various kludges. Recursive feedback to represent previous state.
D’Amour 2020 — Underspecification
D’Amour, Alexander, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley. “Underspecification Presents Challenges for Credibility in Modern Machine Learning.” arXiv preprint arXiv:2011.03395 (2020).
Very nice work. Strange terminology, but intuitive results. Makes us ask “what is sparseness?”
D’Amour 2020 — Dynamic Simulation of Fairness
D’Amour, Alexander, Hansa Srivasan, James Atwood, Pallavi Baljekar, D Sculley, amd Yoni Halpern. Fairness Is Not Static: Deeper Understanding of Long Term Fairness via Simulation Studies, FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. January 2020 Pages 525–534
Sociology? Economics? Some simple experiments well explained but no clarity of results.
Dalrymple 2024 — Formal Methodists Nonsense
Dalrymple, David “davidad”, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, and Joshua Tenenbaum. “Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.” arXiv preprint arXiv:2405.06624 (2024).
Software people have known this stuff for decades. Formal methods are not going to be adopted and pretending that they will be is ridiculous. Skip it.
Danzig 2022 — Machines, Bureaucracies, and Markets as Artificial Intelligences
Danzig, Richard. “Machines, Bureaucracies, and Markets as Artificial Intelligences.” (2022).
An outstanding treatise on AI, ML, and emergent systems, premised on the idea that we have something to learn about those fields by studying markets and bureaucracies. Highly readable and thought provoking.
Danzig 2025 — National Security and AI
Danzig, Richard. “Artificial Intelligence, Cybersecurity, and National Security: The Fierce Urgency of Now.” Rand, July 2025. (2025).
An uneven treatment of AI/ML risk and national security. Some base confusion WRT programs and AI code.
Debenedetti 2025 — Security Engineering and Design
Debenedetti, Edoardo, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr. “Defeating Prompt Injections by Design.” arXiv preprint arXiv:2503.18813 (2025). DeepMind.
Authored by some of our favorite people in MLsec, this paper is a beacon of hope for security engineering. Wrapper with proof. Securing natural language interfaces.
De Deyne 2020 — Psych Rep Grounding
De Deyne, Simon, Danielle Navarro, Guillem Collell, and Andrew Perfors. “Visual and Affective Grounding in Language and Mind.” PsyArv preprint PsyArv:q97f8 (2020).
Too much insider psych gobbledygook in this paper. Lots of results, very poorly presented. An important subject best approached another way.
Deepseek-AI 2025— DeepSeek R1
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J.L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R.J. Chen, R.L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S.S. Li et al. (100 additional authors not shown). “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv preprint arXiv:2501.12948 (2025).
A convoluted explanation at best. This writeup by Riley Eller: Just wanted to take a few minutes to write up a little primer on how China made such important strides in model efficiency with DSR1. If you’re not interested in the “how” or the “what” but want to get a non-China originated version of the same tech, watch for Dolphin’s upcoming release (https://huggingface.co/collections/cognitivecomputations/). So, what did China do? They made “a” GPT that runs way faster on last gen hardware. How did they do it? They used the same generalized pre-trained transformer (GPT) scheme that’s been happening for a few years. They added the “iterate a few times to handle more complex questions” optimization that was published last year. They decided to chop the memory requirements by 75%, probably due to the Biden restrictions on export, which makes last-gen GPUs completely effective. By using 8-bit numbers instead of 32-bit, they get a lot less fine-grained learning in single neurons BUT they get more generalized intelligence across the network. They used reinforcement learning instead of supervised fine tuning; effectively pushing interesting behavior from the “teach me about my job” phase (fine tuning) to the training phase, which in turn makes the model able to understand why people talk about their task activities more sensibly. More on this later. They used a “mixture of experts” model so they sort of train a series of smaller GPTs on sectors of knowledge, localizing math/logic/programming in a “left brain” (I’m being cheeky with that) and art/poetry/rhetoric into a “right brain” (again), but with several “lobes” that each get a voice. It’s a chorale performance rather than an aria. This allows fewer “cross town connections” meaning that more understanding can be baked into each expert without magnifying the quadratic (N^2) cost of the transformer model. Finally, they used a token prediction model that works to guess way ahead of time what’s going to be said. The consequence is about a 50% reduction in runtime. How big is the innovation here? Zero. Literally every method above has a wikipedia page or performance article available. https://en.wikipedia.org/wiki/Mixture_of_experts https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning) https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback https://towardsdatascience.com/fine-tuning-llms-with-32-bit-8-bit-and-paged-adamw-optimizers-1034e3105634 https://medium.com/@himankvjain/accelerating-language-models-with-multi-token-prediction-9f0167232f5b https://dzone.com/articles/understanding-inference-time-compute So the question here isn’t “how did they do it” but rather “what are the clowns at OpenAI doing other than blowing smoke up each other’s backsides?”
Dell’Acqua 2023 — Business School Bullshit.
Dell’Acqua, Fabrizio, Edward McFowland III, Ethan Mollick, Hila Lifshitz-Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality.” Harvard Business School working paper 24-013 (2023).
Social science business school bullshit. Very nonsense. Much marketing nonsense. Opinions do not science make.
Dhariwal 2020— Music generation
Dhariwal, Prafulla, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. “Jukebox: A Generative Model for Music.” arXiv preprint arXiv:2005.00341 (2020).
Generating music with a very weird model. Training a model on raw audio. Also see https://openai.com/blog/jukebox/
Diaz 2025 — Agents
Díaz, Santiago, Christoph Kern, Kara Olive. 2025 May 2025 Google’s Approach for Secure AI Agents: An Introduction. Technical Report, Google. September 2025.
Very basic work in agentic security which at least properly uses a risk framework. No separation of duties. Too much emphasis on policy (which nobody has developed). Shallow work like this reminds us of where everyone else sits. (Honestly, we expect better science from xtof.)
Dohmatob 2024 — Model Collapse
Dohmatob, Elvis, Yunzhen Feng, Arjun Subramonian, Julia Kempe. “Strong Model Collapse.” arXiv preprint arXiv:2410.04840 (2024).
Recursive pollution leads to model collapse. This view of strong model collapse describes what happens in the case of recursive data poison.
Dou 2024 — Coding
Dou, Shihan, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui. “StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback.” arXiv preprint arXiv:2402.01391 (2024).
It is striking and hilarious how much this mirrors symbolic AI from the ’90s. One of those “empirical” studies that is not worth much thinking. Misuse of “novelty” pretty much says it all.
Duarte 2011 — Insect Evolution and Self-organization
Duarte, Ana, Franz J. Weissing, Ido Pen, and Laurent Keller. “An Evolutionary Perspective on Self-Organized Division of Labor in Social Insects.” Annual Review of Ecology, Evolution, and Systematics 42 (2011): 91-110.
Survey that combines self-organization with evolutionary adaptation.
Dziedzic 2022 — p-DkNN
Dziedzic, Adam, Stephan Rabanser, Mohammad Yaghini, Armin Ale, Murat A. Erdogdu, and Nicolas Papernot. “p-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations.” arXiv preprint arXiv:2207.12545 (2022).
Using common statistical methods on hidden layers to produce a confidence score. (S and C robustness is interesting.)
Edemacu 2025 — FilterRAG
Edemacu, Kennedy, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, and Jong Wook Kim. “Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation.” arXiv preprint arXiv:2508.02835 (2025).
Build random classifier, use it.
Ellis 2020— Dreamcoder
Ellis, Kevin, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum. “DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning .” arXiv preprint arXiv:2006.08381 (2020).
Great paper combining symbolic, functional, and statistical AI in an elegant way.
Egashira 2025 — Pruning Attack
Egashira, Kazuki, Robin Staab, Thibaud Gloaguen, Mark Vero, and Martin Vechev. “Fewer Weights, More Problems: A Practical Attack on LLM Pruning.” arXiv preprint arXiv:2510.07985 (2025).
Lightweight with lots of so-called ideas about WHY without showing WHY. The results are very basic and unsurprising. Workaday.
Eniser 2020— Adversarial Image Defense
Hasan Ferit Eniser, Maria Christakis, Valentin Wüstholz “RAID: Randomized Adversarial-Input Detection for Neural Networks” arXiv preprint arXiv:2002.02776 (2020).
This paper describes a very narrow defense against adversarial image input. Experiments are very arbitrary and lack focus. One interesting note is that the defense leverages activation patterns.
Evans 2019 — Naïve Privacy.
Evans, Georgina, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. “Statistically valid inferences from privacy protected data.” American Political Science Review (2019).
A social science perspective on ML with a requisitely naïve approach to privacy. Zero clue about security. This paper demonstrates why #MLsec work steeped in a security engineering perspective is important.
Eykholt 2018— Physical Attacks on Vision
Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. “Robust physical-world attacks on deep learning visual classification.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625-1634. 2018.
Tape on the stop sign paper. Fairly naive attacks on non-robust representations that are meant to be psychologically plausible in that humans won’t notice. Many “empirical” settings.
Farrell 2025 — Social Technology
Farrell, Henry, Alison Gopnik, Cosma Shalizi, and James Evans. “Large AI models are cultural and social technologies.” Science, vol. 387, no. 6739 (2025): 1153-1156.
A response to Melanie Mitchell’s work, this is a sociology and philosophy paper on LLMs as a social technology. Lots of emphasis on representation. Echoes of extended mind theory. Implications regarding “beigification.”
Fazelpour 2020— Algorithmic Fairness
Fazelpour,Sina and Zachary C. Lipton “Algorithmic Fairness from a Non-ideal Perspective” arXiv preprint arXiv:2001.09773 (2020).
An uncharacteristically good social justice in ML paper. Addresses the broader problem of algorithmic failure. Written by a computer scientists (less gobbledygook).
Feffer 2024 — Red Teaming
Feffer, Michael, Anusha Sinha, Wesley Hanwen Deng, Zachary C. Lipton, Hoda Heidari. “Red-Teaming for Generative AI: Silver Bullet or Security Theater?” arXiv preprint arXiv:2401.15897 (2024).
The pen testing diatribe refried. Guess what? Badnessometers are no security meters! This coheres with BIML’s view.
Feldman 2013 — The neural binding problem (s)
Feldman, Jerome. “The neural binding problem (s).” Cognitive neurodynamics 7 (2013): 1-11.
Lots has happened in the eleven+ years since this paper was published in both LLMs and Neuroscience. Worth a read to know your history.
Feldman 2020— Memorization
Feldman, Vitaly. “Does learning require memorization? a short tale about a long tail.” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 954-959. 2020.
A set of very intuitive, well-explained ideas backed up by reams of somewhat inscrutable math. Upshot: memorization is often unavaiodable and mechanisms to limit it screw things up.
Foerster 2026— Architectural Isolation for Computer Use Agents
Foerster, Hanna, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao “CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents.” arXiv preprint arXiv:2601.09923 (2026).
This paper tries so hard to be good but shows what happens when security engineering plays second fiddle to agentic AI (through Computer Use Agents using a GUI). Results are thin and repeated. Pretends that your PC is somehow “isolated.”
Franklin 2026 — AI Agent Traps
Franklin, Matija, Nenad Tomašev, Julian Jacobs, Joel Z. Leibo, and Simon Osindero. “AI Agent Traps.” Google DeepMind, SSRN preprint (March 2026).
This paper is a honey pot analog. Very “airy” with uncanny valley writing…we bet this one was written by an LLM. This is about agents working on a polluted environment and is related to poison, pollution, and recursion.
Fu 2025— Model Collapse in Self-consuming Training Loops
Fu, Shi, Yingjie Wang, Yuzhu Chen, Xinmei Tian, and Dacheng Tao. “A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops.” arXiv preprint arXiv:2502.18865 (2025).
Published at ICLR 2025. A bit overfocused on the real vs synthetic data problem, this paper covers the depletion of real data available for training ML. STLs are getting very close indeed to recursive pollution, so the math here is relevant.
Ganguli 2022— Pen Testing LLMs
Ganguli, Deep, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann et al. “Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.” arXiv preprint arXiv:2209.07858 (2022).
Absolute malarky informed by zero understanding of security, pen testing, and what a real red team does.
Gamaleldin 2018— Adversarial Reprogramming
Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein “Adversarial Reprogramming of Neural Networks” arXiv preprint arXiv:1806.11146 (2018).
A very interesting paper well worth a read, though the work is very weird. The idea of reprogramming existing ML tech stacks in an adversarial fashion is powerful. Given a Turing complete language construct, all kinds of terrible shenanigans could result. Imagine ransomware running on photo recognition ML machines.
Gao 2025 — H-Neurons
Gao, Cheng, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, and Maosong Sun. “H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs.” arXiv preprint arXiv:2512.01797 (2025).
Looking into associated overlap of activation during bad behavior. Assumes a non-distributed representation; overlap of principal-component parts. Analyzing data along (especially tagged) does not necessitate statistical evidence.
Geer 2023 — Establishing the Conditions of Engagement with Machines
Geer, Dan, and Glenn Gaffney. “Establishing the Conditions of Engagement with Machines.” (2023).
An interesting view of autonomy and control. How do we build an assurance case for emergent systems? What controls do we have?
Geiger 2024 — Toy Symbols
Geiger, Atticus, Zhengxuan Wu, Christopher Potts, Thomas Icard, and Noah D. Goodman. “Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations.” arXiv preprint arXiv:2303.02536 (2024).
This work is interesting, but seems philosophically confused. If you use a WHAT machine to do a HOW problem, you can find HOW parts in there. Toy symbols represented in the network do not a foundation make.
Geiping 2024 — Boring Attacks
Geiping, Jonas, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein. “Coercing LLMs to do and reveal (almost) anything.” arXiv preprint arXiv:2402.14020 (2024).
This simple minded paper is all about attacks. Philosophically and computationally very naive.
Golchin 2024 — Empirical Book Report
Golchin, Shahriar, Mihai Surdeanu. “Time Travel in LLMs: Tracing Data Contamination in Large Language Models.” arXiv preprint arXiv:2308.08493 (2024).
Published in ICLR. An empirical book report on running a handful of simple tests on given models, obscured by poor technical writing. This is simple stuff poofed up to appear to be complicated (and is the kind of work that makes academics look silly).
Goldfeder 2026 — LeCun Folk Philosophy
Goldfeder, Judah, Philippe Wyder, Yann LeCun, and Ravid Shwartz-Ziv. “AI Must Embrace Specialization via Superhuman Adaptable Intelligence.” arXiv preprint arXiv:2602.23643 (2026).
This is a folk philosophy paper defining how to win by extrapolation. We find it light and more of a spat passing for an argument.
Goldstone 2023 — Emergence of Roles
Goldstone, Robert L., Edgar Andrade-Lotero, Robert D. Hawkins, and Michael E. Roberts. “The Emergence of Specialized Roles Within Groups.” (2023).
BIML wants to know how groups of people self-organize without top-down coordination so that we can apply that same insight to Agentic AI. This paper describes CARMI. In our view, an emergent computation approach to agents control is going to be necessary.
Goldwasser 2022 — Trojans in ML
Goldwasser, Shafi, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. “Planting Undetectable Backdoors in Machine Learning Models.” arXiv preprint arXiv:2204.06974 (2022).
You can’t test your way out of possible backdoor space (in CS or in deep learning). Running arbitrary code someone evil wrote is not safe. Obvious and good. You can Trojan EVERY DNN undetectably.
Goodman 2017 — EU regulations and Right to Explanation
Goodman, Bryce, and Seth Flaxman. “European Union regulations on algorithmic decision-making and a “right to explanation”.” AI magazine 38, no. 3 (2017): 50-57.
Removing data from a model is difficult or impossible. The right to be forgotten seems to have trumped the right to explanation discussed in this dated paper. Woe is us.
Goodman 2019 — Wagner on Adversarial Testing
Goodman, Dan and Tao Wei . “Cloud-based Image Classification Service Is Not Robust To Simple Transformations:A Forgotten Battlefield” arXiv preprint arXiv:1906.07997 (2019).
Naive experiment on cloud services using well-known methods. Real result: hints at structured noise vs statistical noise as attack type. Representation matters.
GPT-3 2020 — GPT-3 Launch Paper
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. “Language Models are Few-Shot Learners” arXiv preprint arXiv:2005.14165 (2020).
Autoregressive language model that predicts next token. Memorization?! Astounding results. Section 6 is a basic treatment of MLsec issues by Ariel Herbert-Voss. A little too ass cover on the bias front but well worth thinking about.
Graves 2020 — RNN Handwriting Generation
Alex Graves. “Generating Sequences With Recurrent Neural Networks” arXiv preprint arXiv:1308.0850 (2014).
Engineering tract documenting an auto-regressive model and various kludges. Reads like a thesis. Kludge heavy.
Graziano 2015 — The attention schema theory
Graziano, Michael SA, and Taylor W. Webb. “The attention schema theory: a mechanistic account of subjective awareness.” Frontiers in psychology (2015): 500.
This is a tight, well-reasoned paper with a simple hypothesis. Covers subjective awareness and proposes that awareness is the brain’s internal model of the process of attention. At the intersection of philosophy of mind and cognitive psychology. Recommended.
Graziano 2022 — A conceptual framework for consciousness
Graziano, Michael SA. “A conceptual framework for consciousness.” Proceedings of the National Academy of Sciences 119, no. 18 (2022): e2116933119.
Very interesting paper. Clear. Concise. Compelling. Would be fun to model this kind of thing.
Gregor 2020 — Temporal difference variational auto-encoder.
Gregor, Karol, George Papamakarios, Frederic Besse, Lars Buesing, and Theophane Weber. “Temporal difference variational auto-encoder.” arXiv preprint arXiv:1806.03107 (2018).
This paper is a mumbo jumbo mix of insider language and statistics. This is motivated by work at the very edge but does not help anyone other than scientists at the very edge. Even the problem they are trying to solve is unclear and badly motivated. Skip it.
Gu 2019 — BadNets: Classic Data Poisoning
Gu, Tianyu, Brendan Dolan-Gavitt, Siddharth Garg. “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain” arXiv preprint arXiv:1708.06733 (2019).
A paper about Trojan functionality. Solidly written and easy to understand. This is classic data poisoning.
Guan 2026 — AI Adaptive Worms
Guan, Jonas, Tom Blanchard, Hanna Foerster, Hengrui Jia, Gabriel Huang, and Nicolas Papernot. “AI Agents Enable Adaptive Computer Worms.” arXiv preprint arXiv:2606.03811 (2026).
This paper is a clarion call. Time to wake up! Papernot at his best, reminding us why Machine Learning Security is crucially important. See our blog entry Echoes of the Morris Wake-up Call of 1988.
Guedj 2019 — PAC-Bayes
Guedj, Benjamin. “A Primer on PAC-Bayesian Learning” arXiv preprint arXiv:1901.05353 (2019).
Heavy on theory. Solid intro to PAC-Bayes. Relevant to ML bounding conditions is some cases.
Guo 2023 — Evaluating LLMs
Guo, Zishan, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong. “Evaluating Large Language Models: A Comprehensive Survey” arXiv preprint arXiv:2310.19736 (2023).
Big catalog of statistical measures. No conception of cognitive testing. Where is Winograd when you need him? A massive pile of inconsistency. No hint of convergence.
Halevy 2009 — Data Power
Halevy, Alon, Peter Norvig, Frenando Pereira. “The Unreasonable Effectiveness of Data, IEEE Intelligent Systems, Volume 24, Number 2, pp 8-12.
A seminal paper on why social media data are so powerful. It’s the data stupid! The pendulum swings towards data are all you need.
Hall 2019 — XAI (explainable AI)
Hall, Patrick, Navdeep Gill, and Nicholas Schmidt. “Proposed Guidelines for the Responsible Use of Explainable Machine Learning” arXiv preprint arXiv:1906.03533 (2019).
Explanation versus testing and debugging. This paper is weirdly legalistic. Lots of financial system examples.
Hall 2023 — Meta Bias
Hall, Melissa, Laurens van der Maaten, Laura Gustafson, Maxwell Jones, Aaron Adcock. “A Systematic Study of Bias Amplification” arXiv preprint arXiv:2201.11706 (2022).
Meta trying to address recursive pollution by being a better feudal lord.
Handa 2025 — Economic Tasks Performed with AI
Handa, Kunal, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, Kevin K. Troy, Dario Amodei, Jared Kaplan, Jack Clark, and Deep Ganguli. “Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations.” arXiv preprint arXiv:2503.04761 (2025).
Snapshot of pre-agentic state of AI impact on the workforce. Very thin and obvious.
Hawkins 2016 — A Theory of Sequence Memory in Neocortex
Hawkins, Jeff, and Subutai Ahmad. “Why neurons have thousands of synapses, a theory of sequence memory in neocortex.” Frontiers in neural circuits (2016): 23.
Cells that fire together, wire together. Hebb rule as instantiated in dendrites. A more realistic neuron model.
Hayase 2024 — Tokenizers Matter (Allen Institute)
Hayase, Jonathan, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith. “Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? arXiv preprint arXiv:2407.16607v2 (2024)
Tokenization is too often ignored as a model factor. The big finding here is that BPE leaks statistics about training sets. This is a very clever paper.
Henderson 2018 — Hacking Around with ML
Henderson, Peter, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup and David Meger. “Deep Reinforcement Learning that Matters arXiv preprint arXiv:1709.06560 (2018)
We tweaked lots of things and found some stuff. Things matter. How you measure stuff also matters.
Hendrycks 2019 — Robustness (or not)
Hendrycks, Dan and Thomas Dietterich. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations arXiv preprint arXiv:1903.12261 (2019)
How to “spread out” generalization. Some influence from human error-making would help. Perturbations.
Hendrycks 2020 — Robustness (or not)
Hendrycks, Dan, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization arXiv preprint arXiv:2006.16241 (2020)
Robustness can’t be achieved with simple distribution shifts. Clear result.
Hinton 2015 — Review
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” nature 521, no. 7553 (2015): 436.
This review from Nature covers the basics in an introductory way. Some hints at representation as a thing. Make clear that more data and faster CPUs account for the resurgence.
Hoffmann 2019 — Fairness Politics
Hoffmann, Anna. “Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse, Information, Communication & Society, Volume 22, Number 7, pp 900-915.
This paper is all problems and no solutions couched in high academic blather. A (very negative) overview of politics and ML/AI for an audience of insiders.
Hofstadter 2023 — Is there an “I” in AI?
Hofstadter, Douglas. “Is there an “I” in AI?.” (2023).
A very interesting position regarding general AI and what is going on with LLMs. Dughof is leaning on error making as a verification scheme (in terms of cognitive capability) and is worried that the errors ML LLMs are making are getting way better.
Hong 2019 — Hardware Fault Injection
Hong, Sangghyun, Pietro Frigo, Yiğitcan Kaya, Cristiano Giuffrida, Tudor Dumitraş. “Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks.” arXiv preprint 1906.01017 (2019)
NN’s run on computers, oh my! Rowhammer attacks against running NN models work just fine.
Hoover 2023 — A New ML Architecture
Hoover, Benjamin, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov. “Energy Transformer.” arXiv preprint 2302.07253 (2023)
Very interesting work. A better-grounded transformer model. Makes clear that transformers are operating a memory. Further unifies transformer architectures and diffusion models.
Hoover 2024 — Dense Associative Memory
Hoover, Benjamin, Duen Chau, Hendrik Strobelt, Parkshit Ram, Dmitry Krotov. “Dense Associative Memory Through the Lens of Random Features.” arXiv preprint 2410.24153v1 (2024)
Gack! Also wow. New representational distribution. Worth a read.
Huh 2024 — Representation
Huh, Minyoung, Brian Cheung, Tongzhou Wang, and Phillip Isola. “The Platonic Representation Hypothesis.” arXiv preprint 2405.07987 (2024)
This is about WHAT is being represented rather than HOW to compute. WHAT machines are data skeletons for their WHAT pile. The larger the model, the more important the data. Emphasizes the importance of self-supervised learning.
Jacobsen 2019 — Adversarial Examples
Jörn-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge. “Excessive Invariance Causes Adversarial Vulnerability.” arXiv preprint 1811.00401v2 (2019)
Great use of realistic scenarios in a risk analysis. Hilariously snarky.
Jagielski 2018— Data Poisoning
Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, Bo Li “Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning” arXiv preprint arXiv:1804.00308 (2018).
A solid introduction to the data poisoning subfield. This is a critical category of ML attacks. See the BIML ML attack taxonomy here.
Jagielski 2022— Data Poisoning
Jagielski, Matthew, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace et al. “Measuring Forgetting of Memorized Training Examples.” arXiv preprint arXiv:2207.00099 (2022).
Exploring the notion of privacy as “forgetting” with a specialized view on catastrophic forgetting. Particularly relevant to LLMs with huge data sets. Early training examples are forgotten as training continues. The link between stochasticism and forgetting is explored.
Jedrzejewski 2025 — Adversarial ML Review
Jedrzejewski, Felix Viktor, Lukas Thode, Jannik Fischbach, Tony Gorschek, Daniel Mendez, Niklas Lavesson. “Adversarial Machine Learning in Industry: A Systematic Literature Review, Computers and Security, Volume 145, October 2024.
Wonder why science journals are useless? This paper from 2024 is already seriously out of date. Results are circa 2020 (a year after BIML was formed). Pre-LLM adversarial examples are the main subject. Building Security In is barely mentioned.
Jetley 2018 — Attention
Jetley, Saumya, Nicholas A. Lord, Namhoon Lee, and Philip HS Torr. “Learn to pay attention.” arXiv preprint arXiv:1804.02391 (2018).
A technical treatment of one implementation of attention mechanisms in CNNs. Lots of engineering description and very little motivation. Worth a read but not the most powerful work.
Jetley 2018 — On generalization and vulnerability
Jetley, Saumya, Nicholas A. Lord, and Phillip H.S.Torr. “With Friends Like These, Who Needs Adversaries?.” 32nd Conference on Neural Information Processing Systems. 2018.
Excellent paper. Driven by theory and demostrated by experimentation, generalization in DCNs trades off agains vulnerability
Jha 2019— (Weak) Adversarial Defense
Susmit Jha, Sunny Raj, Steven Lawrence Fernandes, Sumit Kumar Jha, Somesh Jha, Gunjan Verma, Brian Jalaian, Ananthram Swami “Attribution-driven Causal Analysis for Detection of Adversarial Examples” arXiv preprint arXiv:1903.05821 (2019).
Treating pixels in an image as very small “features,” this work tries to kill important features that drive too much of the output (in some sense weakening the natural representation). This kind of masking makes the networks perform poorly. Pretty dumb.
Jiang 2024— LLMs Don’t Reason
Jiang, Bowen, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth. “A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners“. arXiv preprint arXiv:2406.11050 (2024).
Evaluation of LLMs should all be more like this. Very solid work on templates and instances of the same problem with systematic straightforward probing. Token bias is real. SOTA is looking kinda paltry.
Jin 2020— Adversarial Text
Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. “Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment“. arXiv preprint arXiv:1907.11932 (2020).
A cute but not very profound paper. Focuses on attack category #1 (adversarial examples) approached through text processing. BERT is an important language processing model and serves as the target. Low detectability plays a role in the attack model.
Johnson 2013— Rise of New Machine Ecology
Johnson, Neil, Guannan Zhao, Eric Hunsader, Hong Qi, Nicholas Johnson, Jing Meng, and Brian Tivnan. “Abrupt rise of new machine ecology beyond human response time“. Scientific reports 3, no. 1 (2013): 1-7.
You’ve probably heard of high frequency trading, flash crashes, etc. This paper explains how adaptive algorithms are involved in this activity and how they happen at subhuman perception speeds. A picosecond is a thing.
Jolicoeur-Martineau 2025— Tiny Recursive Networks
Alexia Jolicoeur-Martineau. “Less is More: Recursive Reasoning with Tiny Networks“. arXiv preprint arXiv:2510.04871 (2025).
This is an engineering exercise akin to “set it to 57,” but it is really interesting. A set of weekend kludges that has important implications. Harold wants to pursue this line to think about how integers are represented. Won the ARC prize.
Jones 2004 — NLP and Generative Models
Jones, Karen. 2004 Language modelling’s generative model: is it rational?. Technical Report, University of Cambridge. June 2004.
A hilarious paper that is critical of LM’s (as defined very tightly by the author). Appendix is more useful than the rambling main text.
Juarrero 2023— Philosophy of Mind
Juarrero, Alicia. “Context Changes Everything: How Constraints Create Coherence.” Chapter 13, Context Changes Everything (2023): 197-209. MIT Press.
A solid treatment of the 4Es theory (Embodied, Embedded, Extended, Enactive) properly grounded in philosophy of mind.
Jumper 2021— AlphaFold
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool et al. “Highly accurate protein structure prediction with AlphaFold.” Nature 596, no. 7873 (2021): 583-589.
A difficult to read paper (due mostly to unfamiliarity with the large number of subfields), but very interesting work. Computational geometry, optimization, physics, microbiology, evolution… combined into a notably better deep learning system informed by science. Hybrid model for the win.
Juuti 2019— Model Extraction
Juuti, Mika, Sebastian Szyller, Samuel Marchal, and N. Asokan. “PRADA: protecting against DNN model stealing attacks.” In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 512-527. IEEE, 2019.
This paper is very good (and is number 6 in our top 5 list). A super clear treatment of extraction attacks and adversarial examples with nice notation, excellent algorithmic description, and solid basic concepts. Describes improved and generalized extraction attacks and protections against them. The protections are somewhat naïve.
Kairouz 2019— Generative Adversarial Models and Bias
Kairouz, Peter, Jiachun Liao, Chong Huang, Maunil Vyas, Monica Welfert, and Lalitha Sankar. “Generating Fair Universal Representations using Adversarial Models.” arXiv preprint arXiv:1910.00411 (2019).
A clear but very dense paper. Use GANs to hide sensitive features in a representation. The encoder tries to find the sensitive features. This purports to work on fairness.
Kaplan 2020— Enormous Transformers
Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei. “Scaling Laws for Neural Language Models” arXiv preprint arXiv:2001.08361 (2020).
Easy, straightforward paper, seminal in the scaling literature. We revisited this one after four years. The only issue missing is any notion of data quality (vs data set size). Cardinality of compute and data is a good start.
Kazemi 2019— Time
Kazemi, Seyed, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, Marcus Brubaker. “Time2Vec: Learning a Vector Representation of Time” arXiv preprint arXiv:1907.05321 (2019).
Very abstract treatment of time represented as a learned periodic vector. More engineering than ML.
Kemker 2018— Catastrophic Forgetting
Kemker, Ronald, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. “Measuring Catastrophic Forgetting in Neural Networks.” In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1. 2018.
This paper is (sadly) undergraduate level work. Good philosophy but very much uninspired philosophy. Kind of “we did a bunch of random things and here are the results.”
Kilbertus 2018— Learning and Causality
Kilbertus, Niki, Giambattista Parascandolo, Bernhard Schölkopf. “Generalization in anti-causal learning” arXiv preprint arXiv:1812.00524 (2018).
A vague position paper that is more philosophy than anything else. Emphasizes the importance of generation (and causal models). Representation issues around continuity are explored.
Kleinberg 2016— Bias Tradeoffs
Kleinberg, Jon, Sendhil Mullainathan, Manish Raghavan. “Inherent Trade-Offs in the Fair Determination of Risk Scores” arXiv preprint arXiv:1609.05807 (2016).
Very strong for a bias paper. Brings some rigor to goal states and makes clear that tradeoffs exist. If you read only one bias paper, read this one.
Koh 2017— Influence Functions
Koh, Pang Wei and Percy Liang. “Understanding Black-box Predictions via Influence Functions” arXiv preprint arXiv:1703.04730 (2017).
Understanding adversarial inputs. Getting the “same” result through diverse paths. Influence functions, representation, and positive/negative data points.
Koralus 2025— Agents and Autonomy
Koralus, Phillip. “The Philosophic Turn for AI Agents: Replacing centralized digital rhetoric with decentralized truth-seeking” arXiv preprint arXiv:2504.18601 (2025).
A bit too much blah blah blah without enough clarity. One or two interesting ideas. This is not grounded in any real philosophy of mind.
Kosinski 2013 — Data About You
Kosinski, Michal, David Stillwell, Thore Graepel. “Private traits and attributes are predictable from digital records of human behavior .” Proceedings of the National Academy of Science 110(15): 5802-5.
A classic paper. Facebook (Meta) knows more about you than you think. Algorithms will use this to manipulate you.
Krizhevsky 2012 — Convolutional Nets (ReLU)
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
Elegant series of hacks to reduce overfitting. A bit of hand waving. Reference to CPU speed and huge data sets. Depth is important, but nobody knows why.
Krotov 2016 — Hopfield Nets
Krotov, Dmitry, and John J. Hopfield. “ Dense associative memory for pattern recognition.” arXiv preprint arXiv:1606.01164 (2016).
This is a very solid introductory explanation of modern Hopfield nets. A bit “mathy” but with an important result that is worth unpacking and understanding.
For more explanation on Hopfield Networks, watch these videos with Dmitry Krotov.Krotov 2025 — Hopfield Nets Continued
Krotov, Dmitry, Benjamin Hoover, Parikshit Ram, Bao Pham. “Modern Methods in Associative Memory.” arXiv preprint arXiv:2507.06211 (2025).
Hopfield networks provide an important alternative to transformer architectures. This is a tutorial. Chapter 4 shows that lots of data overwhelm the model, bringing to light computational efficiency issues. Krotov’s very mathematical work uses AM theory to mimic other models. Math sure is powerful.
Kumar 2025— Representation (fractured entangled representation)
Kumar, Akarsh, Jeff Clune, Joel Lehman, Kenneth O. Stanley “Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis” arXiv preprint arXiv:2505.11581 (2025).
Excellent work on representation in the face of stochastic gradient descent. This is a top paper in representation. Introduces FER. Toward creativity and intuition.
Kurita 2020— Transfer attacks (backdoors)
Kurita, Keita, Paul Michel, Graham Neubig. “Weight Poisoning Attacks on Pre-trained Models” arXiv preprint arXiv:2004.06660 (2020).
Transfer attacks (one of the six BIML attack categories). Very basic results. Fairly obvious. Simple. Nice. Clear. (The only bug is poor terminology…misuse of “backdoor” which has crept into the MLsec literature.)
Labunets 2025— Malicious input between models
Labunets, Andrey, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes. “Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API” arXiv preprint arXiv:2501.09798 (2025).
Lack of coordination betweem designers of ML systems leads to attacks like this. Lots of leaky interfaces and poor security design. Malicious input is still a major problem. This one is fun because it uses the fine-tuning interface.
Lake 2015 — Cogsci
Lake, Brenden, Ruslan Salakhutdinov, Joshua Tenenbaum. “Human-level concept learning through probabilistic program induction.” Science, vol. 350, no. 6266 (2015): 1332-1338.
Representation, models, and one-shot learning. A study promoting BPL.
Lake 2017 — Recurrent Net Weakness
Lake, Brenden, and Marco Baroni. “Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks.” (2018).
Naive micro domain with misleading maps into human semantics (movement). An artificial attack angled with structure as weapon.
Lake 2020— Concepts
Lake, Breden and Gregory L. Murphy. “Word meaning in minds and machines” arXiv preprint arXiv:2008.01766 (2020).
Super clear (maybe obvious) treatment of fluid concepts a la dughof. Getting past the bag of words.
Lampinen 2024 — DeepMind Representations
Lampinen, Andrew Kyle, Stephanie C. Y. Chan, and Katherine Hermann. “Learned feature representations are biased by complexity, learning order, position, and more” arXiv preprint arXiv:2405.05847 (2024).
Excellent work moving towards the heart of ML based on gradients.
Langford 2024 — Trojans in the Most Obvious Sense
Langford, Harry, Illa Shumailov, Yiren Zhaao, Robert Mullins, and Nicolas Papernot. “Architectural Neural Backdoors from First Principles” arXiv preprint arXiv:2402.06957 (2024).
Such great people doing such silly work. This work puts a Trojan in the WHAT machine itself (not interesting) instead of in the WHAT (data). That makes it not only obvious, but boring.
Lapuschkin 2019 — Unmasking Clever Hans predictors
Lapuschkin, Sebastian, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. “Unmasking Clever Hans predictors and assessing what machines really learn.” Nature communications 10, no. 1 (2019): 1096.
This paper, though interesting, is limited to addressing only visual domains. Pixel relevance thus plays an outsize role in its treatment. A decent treatment to help counter ML hype, but no as strong as it could be.
Lee 2026 — Meta-Harness
Lee, Yoonho, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. “Meta-Harness: End-to-End Optimization of Model Harnesses.” arXiv preprint arXiv:2603.28052 (2026).
Harnesses for Agentic AI include perception and memory devices that allow an LLM to externalize and preserve state. This work describes iterating over a set of harnesses and finding better ones. Results are impressive.
Legg 2007— Universal Intelligence Definition
Shane Legg, Marcus Hutter “Universal Intelligence: A Definition of Machine Intelligence” arXiv preprint arXiv:0712.3329 (2007).
This is as much a philosophy paper as it is an ML paper. Well worth a read, especially if you are not familiar with philosophy of mind and how it pertains to AI. Defines a (non-computable) measure of intelligence and then tries to move that to something useful.
Lehman 2025— Evolution
Lehman, Joel, Elliot Meyerson, Tarek El-Gaaly, Kenneth O. Stanley, Tarin Ziyaee. “Evolution and The Knightian Blindspot of Machine Learning.” arXiv preprint arXiv:2501.13075 (2025).
Surviving randomness. An inconsistent approach to filtering in order to create robustness is in tension with request that search be open. Artificial Life.
Lewis 2024— Analogy and LLMs
Lewis, Martha and Melanie Mitchell “Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models” arXiv preprint arXiv:2402.08955 (2024).
Analogy-making and LLMs. This is pretty obvious work, but necessary.
Lin 2024— LLM Boogieman
Lin, Zilong, Jian Cui, Xiaojing Liao, XiaoFeng Wang. “Malla: Demystifying Real-world Large Language Model Integrated Malicious Services” arXiv preprint arXiv:2401.03315 (2024).
Boogieman! Crime! Oh my. Exaggerated/sensational language doesn’t help. Generative ML used for cyber badness.
Liu 2023 — Long Context U
Liu, Nelson F., Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. “Lost in the Middle: How Language Models Use Long Contexts.” arXiv preprint arXiv:2307.03172 (2023).
Generic result, well presented. If there is an ah-ha moment, it is that we underappreciate abstraction. Longer contexts are not necessarily better. Love that U.
Liu 2024 — Alignment with Smaller LLM
Liu, Alisa, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith. “Tuning Language Models by Proxy.” arXiv preprint arXiv:2401.08565.(2024).
Pulling fine-tuning out of the black box to make it cheaper. Very much inside baseball (badly described and motivated). Clearly no cognitive science background. Technically very interesting.
Liu 2026 — World Models
Liu, Ziming, Sophia Sanborn, Surya Ganguli, Andreas Tolias. “From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers.” arXiv preprint arXiv:2602.06923 (2026).
Representation matters and is deeply constrained by tokenization. Excellent work, clearly described with real substance.
Longpre 2025 — Data Pollution
Longpre, Shayne, Kevin Klyman, Ruth E. Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean McGregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Amit Elazari, Andrew Sellars, Casey John Ellis, Dane Sherrets, Dawn Song, Harley Geiger, Ilona Cohen, Lauren McIlvenny, Madhulika Srikumar, Mark M. Jaycox, Markus Anderljung, Nadine Farid Johnson, Nicholas Carlini, Nicolas Miailhe, Nik Marda, Peter Henderson, Rebecca S. Portnoff, Rebecca Weiss, Victoria Westerhoff, Yacine Jernite, Rumman Chowdhury, Percy Liang, Arvind Narayanan. “In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI.” arXiv preprint arXiv:2503.16861.(2025).
A very pollyanna view based on a poor understanding of the sofwtare security solution. Building security in IS NOT penetrate and patch.
Lou 2026 — Agentic Harness
Lou, Xinghua, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, and Kevin P. Murphy. “AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness.” arXiv preprint arXiv:2603.03329 (2026).
This is a tiny little ditty on a weird experiment related to the impact of code generation on LLMs (see CaMeLs). The idea is to build out a code interface for a set of problems.
