Sentiment Analysis Model using DistilBERT

This repository hosts a sentiment analysis model fine-tuned on the IMDb movie reviews dataset using DistilBERT architecture. It's designed to classify text inputs into positive or negative sentiment categories.

Model Description

The model is based on the DistilBERT architecture, a smaller, faster, cheaper, and lighter version of BERT. It has been fine-tuned on the IMDb dataset, which consists of 50,000 movie reviews labeled as positive or negative.

DistilBERT has been proven to retain most of the performance of BERT while being more efficient. This makes it an excellent choice for sentiment analysis tasks where the model's size and speed are essential.

How to Use

To use the model, you will need to install the transformers library from Hugging Face. You can install it using pip:

pip install transformers

Once installed, you can use the following code to classify text using this model:

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification import torch

Load the tokenizer and model from the Hugging Face Hub

tokenizer = DistilBertTokenizer.from_pretrained(Pranav-10/Sentimental_Analysis) model = DistilBertForSequenceClassification.from_pretrained(Pranav-10/Sentimental_Analysis)

Example text

text = "I loved this movie. The performances were fantastic!"

Tokenize text and convert to tensor

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

Predict sentiment

with torch.no_grad(): logits = model(**inputs).logits

Convert logits to probabilities using softmax

probabilities = torch.softmax(logits, dim=-1)

Output the result

print(probabilities)

Evaluation Results The model achieved the following performance on the IMDb dataset:

Accuracy: 90% Precision: 89% Recall: 91% F1 Score: 90% These results indicate the model's high efficiency in classifying sentiments as positive or negative.

Training Procedure The model was trained using the following procedure:

Pre-processing: The dataset was pre-processed by converting all reviews to lowercase and tokenizing using the DistilBERT tokenizer. Optimization: We used the Adam optimizer with a learning rate of 2e-5, a batch size of 16, and trained the model for 3 epochs. Hardware: Training was performed on a single NVIDIA GTX 1650 GPU.