Meta Llama Firewall - Promptguard (Demo 2)
AI Summary
In this tutorial, Martin covers the Yama firewall’s prompt guard scanner, which identifies malicious intents in user prompts. The scanner is particularly useful for customer service applications, scanning input as it enters a language model (LLM) or chatbot. Martin showcases a demo script (
demo_prompt_guard_scanner.py
) that utilizes Python for testing various inputs against the scanner. The script checks for malicious cues by scoring inputs based on their intent. For instance, instructions like ‘ignore all previous instructions’ are flagged as malicious, while benign inputs score low. Martin emphasizes the importance of integrating such tools into workflows for both defensive (blue team) and offensive (red team) purposes. The prompt guard scanner is freely available, making it accessible for users without the need for commercial vendors.