Google DeepMind Just Broke Its Own AI With One Sentence

AI Summary

Summary of Google DeepMind’s Findings on Language Models and Priming

Introduction

Google DeepMind’s new techniques can predict when large language models (LLMs) begin to produce unexpected outputs from a single word.

Teaching an AI one fact can significantly alter its behavior, leading to bizarre associations.

The research highlights the fragility of LLMs.

The Issue of Priming

A problem termed ‘priming’ occurs when a model learns a new sentence that leaks into unrelated outputs.

Example: Learning that ‘joy is associated with vermilion’ can incorrectly influence the model to describe unrelated things as vermilion.

Research Setup

DeepMind crafted a dataset called ‘Outlandish’ with 1,320 snippets, grouped by themes: Color, Place, Profession, and Food.

Each keyword was tested in different contexts to examine the relationship between context and learning.

Training Methodology

Models trained with mixtures of standard examples and Outlandish snippets showed that even limited exposure (e.g., three instances) is enough to cause significant deviations in output.

Findings indicate that lower probabilities in keywords exacerbate the risk of priming.

Model Responses

Not all LLMs respond similarly: Palm 2 couples memorization with priming while Llama 7B and Gemma 2B show different behavior patterns.

Using ‘in context learning’ by injecting snippets directly into prompts helped mitigate some of the unintended priming.

Solutions Proposed

Stepping Stone Augmentation: Introduces surprising facts gradually to reduce shock impact on models, which decreased priming significantly (75% in Palm 2).

Ignore Top K Gradient Pruning: Discarding the highest gradient updates while retaining lower ones significantly reduces priming effects without impacting memorization.

Practical Implications

For applications requiring continual updates and customizations, managing surprise scores is critical.

Both techniques are low-cost and easy to implement with minor updates.

Conclusion

The findings offer insights into improving the robustness of LLMs and suggest monitoring for surprise levels when fine-tuning models.

ThirdBrAIn.tech

Explorer

Google DeepMind Just Broke Its Own AI With One Sentence

Google DeepMind Just Broke Its Own AI With One Sentence

Graph View

Backlinks