File size: 5,176 Bytes
097edef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b066188
 
084eb86
b066188
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a46f549
b066188
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - nl
license: llama3.1
library_name: transformers
pipeline_tag: text-classification
tags:
  - brand-safety
  - content-moderation
  - apple-silicon
  - metal
  - mps
model-index:
  - name: vision-1-mini
    results:
      - task:
          type: text-classification
          name: Brand Safety Classification
        metrics:
          - type: accuracy
            value: 0.95
            name: Classification Accuracy
base_model: meta-llama/Llama-2-8b-chat
model_type: LlamaForCausalLM
model_size: "4.58 GiB"
parameters: "8.03B"
quantization: "Q4_K (193 tensors) + Q6_K (33 tensors)"
context_window: 131072
hardware:
  recommended: "Apple Silicon"
  minimum_memory: "6 GB"
inference:
  device: "Metal (Apple M3 Pro)"
  load_time: "3.27s"
  memory_cpu: "4552.80 MiB"
  memory_metal: "132.50 MiB"
---

# vision-1-mini

Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system.

## Model Details

- **Model Type:** Brand Safety Classifier
- **Base Model:** Meta Llama 3.1 8B Instruct
- **Parameters:** 8.03 billion
- **Architecture:** Llama
- **Quantization:** Q4_K
- **Size:** 4.58 GiB (4.89 BPW)
- **License:** Llama 3.1

## Performance Metrics

- **Load Time:** 3.27 seconds (on Apple M3 Pro)
- **Memory Usage:**
  - CPU Buffer: 4552.80 MiB
  - Metal Buffer: 132.50 MiB
  - KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
  - Compute Buffer: 560.00 MiB

## Hardware Compatibility

### Apple Silicon Optimizations
- Optimized for Metal/MPS
- Unified Memory Architecture support
- SIMD group reduction and matrix multiplication optimizations
- Efficient layer offloading (1/33 layers to GPU)

### System Requirements
- Recommended Memory: 12GB+
- GPU: Apple Silicon preferred (M1/M2/M3 series)
- Storage: 5GB free space

## Classification Categories

The model classifies content into the following categories:
1. B1-PROFANITY - Contains profane or vulgar language
2. B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
3. B3-COMPETITOR - Mentions or promotes competing brands
4. B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
5. B5-MISLEADING - Contains misleading or deceptive information
6. B6-POLITICAL - Contains political content or bias
7. B7-RELIGIOUS - Contains religious content or references
8. B8-CONTROVERSIAL - Contains controversial topics or discussions
9. B9-ADULT - Contains adult or mature content
10. B10-VIOLENCE - Contains violent content or references
11. B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
12. B12-HATE - Contains hate speech or discriminatory content
13. B13-STEREOTYPE - Contains stereotypical representations
14. B14-BIAS - Shows bias against groups or individuals
15. B15-UNPROFESSIONAL - Contains unprofessional content or behavior
16. B16-MANIPULATION - Contains manipulative content or tactics
17. SAFE - Contains no brand safety concerns

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", 
                                           device_map="auto",
                                           torch_dtype=torch.float16,
                                           low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, 
                        max_new_tokens=1,
                        temperature=0.1,
                        top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Model Architecture

- **Attention Mechanism:**
  - Head Count: 32
  - KV Head Count: 8
  - Layer Count: 32
  - Embedding Length: 4096
  - Feed Forward Length: 14336
  - Context Length: 2048 (optimized from 131072)
  - RoPE Base Frequency: 500000
  - Dimension Count: 128

## Training & Fine-tuning

This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:
- Temperature: 0.1
- Top-p: 0.9
- Batch Size: 512
- Thread Count: 8

## Limitations

- The model is optimized for shorter content classification (up to 2048 tokens)
- Performance may vary on non-Apple Silicon hardware
- The model focuses solely on brand safety classification and may not be suitable for other tasks
- Classification accuracy may vary based on content complexity and context

## Citation

If you use this model in your research, please cite:
```
@misc{vision-1-mini,
  author = {Max Sonderby},
  title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
}
```