Updated Readme.md
Browse files
README.md
CHANGED
@@ -1,3 +1,59 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Sentiment Analysis Model using DistilBERT
|
5 |
+
|
6 |
+
This repository hosts a sentiment analysis model fine-tuned on the IMDb movie reviews dataset using DistilBERT architecture. It's designed to classify text inputs into positive or negative sentiment categories.
|
7 |
+
|
8 |
+
## Model Description
|
9 |
+
|
10 |
+
The model is based on the DistilBERT architecture, a smaller, faster, cheaper, and lighter version of BERT. It has been fine-tuned on the IMDb dataset, which consists of 50,000 movie reviews labeled as positive or negative.
|
11 |
+
|
12 |
+
DistilBERT has been proven to retain most of the performance of BERT while being more efficient. This makes it an excellent choice for sentiment analysis tasks where the model's size and speed are essential.
|
13 |
+
|
14 |
+
## How to Use
|
15 |
+
|
16 |
+
To use the model, you will need to install the `transformers` library from Hugging Face. You can install it using pip:
|
17 |
+
|
18 |
+
pip install transformers
|
19 |
+
|
20 |
+
Once installed, you can use the following code to classify text using this model:
|
21 |
+
|
22 |
+
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
|
23 |
+
import torch
|
24 |
+
|
25 |
+
# Load the tokenizer and model from the Hugging Face Hub
|
26 |
+
tokenizer = DistilBertTokenizer.from_pretrained(Pranav-10/Sentimental_Analysis)
|
27 |
+
model = DistilBertForSequenceClassification.from_pretrained(Pranav-10/Sentimental_Analysis)
|
28 |
+
|
29 |
+
# Example text
|
30 |
+
text = "I loved this movie. The performances were fantastic!"
|
31 |
+
|
32 |
+
# Tokenize text and convert to tensor
|
33 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
|
34 |
+
|
35 |
+
# Predict sentiment
|
36 |
+
with torch.no_grad():
|
37 |
+
logits = model(**inputs).logits
|
38 |
+
|
39 |
+
# Convert logits to probabilities using softmax
|
40 |
+
probabilities = torch.softmax(logits, dim=-1)
|
41 |
+
|
42 |
+
# Output the result
|
43 |
+
print(probabilities)
|
44 |
+
|
45 |
+
Evaluation Results
|
46 |
+
The model achieved the following performance on the IMDb dataset:
|
47 |
+
|
48 |
+
Accuracy: 90%
|
49 |
+
Precision: 89%
|
50 |
+
Recall: 91%
|
51 |
+
F1 Score: 90%
|
52 |
+
These results indicate the model's high efficiency in classifying sentiments as positive or negative.
|
53 |
+
|
54 |
+
Training Procedure
|
55 |
+
The model was trained using the following procedure:
|
56 |
+
|
57 |
+
Pre-processing: The dataset was pre-processed by converting all reviews to lowercase and tokenizing using the DistilBERT tokenizer.
|
58 |
+
Optimization: We used the Adam optimizer with a learning rate of 2e-5, a batch size of 16, and trained the model for 3 epochs.
|
59 |
+
Hardware: Training was performed on a single NVIDIA GTX 1650 GPU.
|