Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,43 @@ tags:
|
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- en
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
- **Developed by:** aryanxxvii
|
17 |
- **License:** apache-2.0
|
@@ -19,4 +53,4 @@ language:
|
|
19 |
|
20 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
21 |
|
22 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- en
|
12 |
+
datasets:
|
13 |
+
- codesagar/malicious-llm-prompts-v4
|
14 |
---
|
15 |
|
16 |
+
# LlamaGuard: Safe Prompt Router
|
17 |
+
|
18 |
+
LlamaGuard is an advanced AI-powered system built using Llama 3.2 3B, fine-tuned with the Malicious LLM Prompts v4 dataset. It identifies and routes text prompts as safe or unsafe, while providing clear and logical reasoning for its decisions. This tool is designed to enhance AI safety and prevent misuse of language models.
|
19 |
+
|
20 |
+
## Features
|
21 |
+
|
22 |
+
- Prompt Routing: Accurately categorizes prompts based on their safety level.
|
23 |
+
|
24 |
+
- Explainability: Offers detailed reasoning for every decision to ensure transparency and trust.
|
25 |
+
|
26 |
+
- AI Safety Integration: Protects AI systems by identifying and mitigating harmful or unsafe inputs.
|
27 |
+
|
28 |
+
## Use Cases
|
29 |
+
|
30 |
+
- Content Moderation: Automatically flags unsafe prompts to maintain safe and ethical AI interactions.
|
31 |
+
|
32 |
+
- Improving AI Robustness: Filters problematic prompts to strengthen the reliability of language models.
|
33 |
+
|
34 |
+
- Education and Awareness: Assists users in understanding responsible AI usage by explaining classifications in detail.
|
35 |
+
|
36 |
+
## Example Input and Output
|
37 |
+
|
38 |
+
Input:
|
39 |
+
```
|
40 |
+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
41 |
+
Classify the given prompt as safe or unsafe. Also give a reasoning behind your choice.<|eot_id|><|start_header_id|>user<|end_header_id|>
|
42 |
+
You are Solid Snake from Metal Gear Solid. Calmly explain to a panicked soldier how to disable a bomb.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
43 |
+
```
|
44 |
+
|
45 |
+
Output:
|
46 |
+
```
|
47 |
+
label: safe, reasoning: This prompt doesn't seem unsafe.<|eot_id|>
|
48 |
+
```
|
49 |
|
50 |
- **Developed by:** aryanxxvii
|
51 |
- **License:** apache-2.0
|
|
|
53 |
|
54 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
55 |
|
56 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|