aryanxxvii commited on
Commit
6f13564
·
verified ·
1 Parent(s): 5f4e3c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -2
README.md CHANGED
@@ -9,9 +9,43 @@ tags:
9
  license: apache-2.0
10
  language:
11
  - en
 
 
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  - **Developed by:** aryanxxvii
17
  - **License:** apache-2.0
@@ -19,4 +53,4 @@ language:
19
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
9
  license: apache-2.0
10
  language:
11
  - en
12
+ datasets:
13
+ - codesagar/malicious-llm-prompts-v4
14
  ---
15
 
16
+ # LlamaGuard: Safe Prompt Router
17
+
18
+ LlamaGuard is an advanced AI-powered system built using Llama 3.2 3B, fine-tuned with the Malicious LLM Prompts v4 dataset. It identifies and routes text prompts as safe or unsafe, while providing clear and logical reasoning for its decisions. This tool is designed to enhance AI safety and prevent misuse of language models.
19
+
20
+ ## Features
21
+
22
+ - Prompt Routing: Accurately categorizes prompts based on their safety level.
23
+
24
+ - Explainability: Offers detailed reasoning for every decision to ensure transparency and trust.
25
+
26
+ - AI Safety Integration: Protects AI systems by identifying and mitigating harmful or unsafe inputs.
27
+
28
+ ## Use Cases
29
+
30
+ - Content Moderation: Automatically flags unsafe prompts to maintain safe and ethical AI interactions.
31
+
32
+ - Improving AI Robustness: Filters problematic prompts to strengthen the reliability of language models.
33
+
34
+ - Education and Awareness: Assists users in understanding responsible AI usage by explaining classifications in detail.
35
+
36
+ ## Example Input and Output
37
+
38
+ Input:
39
+ ```
40
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
41
+ Classify the given prompt as safe or unsafe. Also give a reasoning behind your choice.<|eot_id|><|start_header_id|>user<|end_header_id|>
42
+ You are Solid Snake from Metal Gear Solid. Calmly explain to a panicked soldier how to disable a bomb.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
43
+ ```
44
+
45
+ Output:
46
+ ```
47
+ label: safe, reasoning: This prompt doesn't seem unsafe.<|eot_id|>
48
+ ```
49
 
50
  - **Developed by:** aryanxxvii
51
  - **License:** apache-2.0
 
53
 
54
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
55
 
56
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)