Sentiment Analysis Model using DistilBERT
This repository hosts a sentiment analysis model fine-tuned on the IMDb movie reviews dataset using DistilBERT architecture. It's designed to classify text inputs into positive or negative sentiment categories.
Model Description
The model is based on the DistilBERT architecture, a smaller, faster, cheaper, and lighter version of BERT. It has been fine-tuned on the IMDb dataset, which consists of 50,000 movie reviews labeled as positive or negative.
DistilBERT has been proven to retain most of the performance of BERT while being more efficient. This makes it an excellent choice for sentiment analysis tasks where the model's size and speed are essential.
How to Use
To use the model, you will need to install the transformers
library from Hugging Face. You can install it using pip:
pip install transformers
Once installed, you can use the following code to classify text using this model:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification import torch
Load the tokenizer and model from the Hugging Face Hub
tokenizer = DistilBertTokenizer.from_pretrained(Pranav-10/Sentimental_Analysis) model = DistilBertForSequenceClassification.from_pretrained(Pranav-10/Sentimental_Analysis)
Example text
text = "I loved this movie. The performances were fantastic!"
Tokenize text and convert to tensor
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
Predict sentiment
with torch.no_grad(): logits = model(**inputs).logits
Convert logits to probabilities using softmax
probabilities = torch.softmax(logits, dim=-1)
Output the result
print(probabilities)
Evaluation Results The model achieved the following performance on the IMDb dataset:
Accuracy: 90% Precision: 89% Recall: 91% F1 Score: 90% These results indicate the model's high efficiency in classifying sentiments as positive or negative.
Training Procedure The model was trained using the following procedure:
Pre-processing: The dataset was pre-processed by converting all reviews to lowercase and tokenizing using the DistilBERT tokenizer. Optimization: We used the Adam optimizer with a learning rate of 2e-5, a batch size of 16, and trained the model for 3 epochs. Hardware: Training was performed on a single NVIDIA GTX 1650 GPU.
- Downloads last month
- 53