Google DeepMind Just Broke Its Own AI With One Sentence
AI Summary
Summary of Google DeepMind’s Findings on Language Models and Priming
- Introduction
- Google DeepMind’s new techniques can predict when large language models (LLMs) begin to produce unexpected outputs from a single word.
- Teaching an AI one fact can significantly alter its behavior, leading to bizarre associations.
- The research highlights the fragility of LLMs.
- The Issue of Priming
- A problem termed ‘priming’ occurs when a model learns a new sentence that leaks into unrelated outputs.
- Example: Learning that ‘joy is associated with vermilion’ can incorrectly influence the model to describe unrelated things as vermilion.
- Research Setup
- DeepMind crafted a dataset called ‘Outlandish’ with 1,320 snippets, grouped by themes: Color, Place, Profession, and Food.
- Each keyword was tested in different contexts to examine the relationship between context and learning.
- Training Methodology
- Models trained with mixtures of standard examples and Outlandish snippets showed that even limited exposure (e.g., three instances) is enough to cause significant deviations in output.
- Findings indicate that lower probabilities in keywords exacerbate the risk of priming.
- Model Responses
- Not all LLMs respond similarly: Palm 2 couples memorization with priming while Llama 7B and Gemma 2B show different behavior patterns.
- Using ‘in context learning’ by injecting snippets directly into prompts helped mitigate some of the unintended priming.
- Solutions Proposed
- Stepping Stone Augmentation: Introduces surprising facts gradually to reduce shock impact on models, which decreased priming significantly (75% in Palm 2).
- Ignore Top K Gradient Pruning: Discarding the highest gradient updates while retaining lower ones significantly reduces priming effects without impacting memorization.
- Practical Implications
- For applications requiring continual updates and customizations, managing surprise scores is critical.
- Both techniques are low-cost and easy to implement with minor updates.
- Conclusion
- The findings offer insights into improving the robustness of LLMs and suggest monitoring for surprise levels when fine-tuning models.