Canstralian commited on
Commit
39dbdf0
·
verified ·
1 Parent(s): b33c880

Upload 6 files

Browse files
Files changed (6) hide show
  1. README.md +8 -64
  2. app.py +24 -26
  3. fine_tuner.py +22 -0
  4. model_selector.py +9 -0
  5. requirements.txt +4 -7
  6. utils.py +13 -0
README.md CHANGED
@@ -1,71 +1,15 @@
1
- ---
2
- title: Transformers Fine Tuner
3
- emoji: 🔥
4
- colorFrom: indigo
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.14.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: A Gradio interface
12
- ---
13
-
14
- ![Python Version](https://img.shields.io/badge/Python-3.10%2B-blue)
15
- ![License](https://img.shields.io/badge/License-Apache%202.0-blue)
16
- ![Last Commit](https://img.shields.io/github/last-commit/Canstralian/transformers-fine-tuner)
17
- ![Issues](https://img.shields.io/github/issues/Canstralian/transformers-fine-tuner)
18
- ![Pull Requests](https://img.shields.io/github/issues-pr/Canstralian/transformers-fine-tuner)
19
- ![Contributors](https://img.shields.io/github/contributors/Canstralian/transformers-fine-tuner)
20
-
21
  # Transformers Fine Tuner
22
 
23
- 🔥 **Transformers Fine Tuner** is a user-friendly Gradio interface that enables seamless fine-tuning of pre-trained transformer models on custom datasets. This tool facilitates efficient model adaptation for various NLP tasks, making it accessible for both beginners and experienced practitioners.
24
 
25
  ## Features
26
 
27
- - **Easy Dataset Integration**: Load datasets via URLs or direct file uploads.
28
- - **Model Selection**: Choose from a variety of pre-trained transformer models.
29
- - **Customizable Training Parameters**: Adjust epochs, batch size, and learning rate to suit your needs.
30
- - **Real-time Monitoring**: Track training progress and performance metrics.
31
-
32
- ## Getting Started
33
-
34
- 1. **Clone the Repository**:
35
- ```bash
36
- git clone https://huggingface.co/spaces/your-username/transformers-fine-tuner
37
- cd transformers-fine-tuner
38
- ```
39
-
40
- 2. **Install Dependencies**:
41
- Ensure you have Python 3.10 or higher. Install the required packages:
42
- ```bash
43
- pip install -r requirements.txt
44
- ```
45
-
46
- 3. **Run the Application**:
47
- ```bash
48
- python app.py
49
- ```
50
- Access the interface at `http://localhost:7860/`.
51
-
52
- ## Usage
53
-
54
- - **Model Name**: Enter the name of the pre-trained model you wish to fine-tune (e.g., `bert-base-uncased`).
55
- - **Dataset URL**: Provide a URL to your dataset.
56
- - **Upload Dataset**: Alternatively, upload a dataset file directly.
57
- - **Number of Epochs**: Set the number of training epochs.
58
- - **Learning Rate**: Specify the learning rate for training.
59
- - **Batch Size**: Define the batch size for training.
60
-
61
- After configuring the parameters, click **Submit** to start the fine-tuning process. Monitor the training progress and performance metrics in real-time.
62
-
63
- ## License
64
-
65
- This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for more details.
66
 
67
- ## Acknowledgments
68
 
69
- - [Hugging Face Transformers](https://huggingface.co/transformers/)
70
- - [Gradio](https://gradio.app/)
71
- - [Datasets](https://huggingface.co/docs/datasets/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Transformers Fine Tuner
2
 
3
+ Transformers Fine Tuner is a user-friendly Gradio interface that enables seamless fine-tuning of pre-trained transformer models on custom datasets.
4
 
5
  ## Features
6
 
7
+ - **Easy Dataset Integration:** Load datasets via URLs or direct file uploads.
8
+ - **Model Selection:** Choose from a variety of pre-trained transformer models.
9
+ - **Customizable Training Parameters:** Adjust epochs, batch size, and learning rate to suit your needs.
10
+ - **Real-time Monitoring:** Track training progress and performance metrics.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ ## Setup
13
 
14
+ 1. Clone the repository:
15
+
 
app.py CHANGED
@@ -1,31 +1,29 @@
1
- import os
2
- import sys
3
  import gradio as gr
4
- from model.model import fine_tune
5
- from data.preprocess import load_data, preprocess_data, save_processed_data
 
6
 
7
- def prepare_and_train(model_name, dataset_path, epochs, batch_size, learning_rate):
8
- # Load and preprocess the dataset
9
- data = load_data(dataset_path)
10
- cleaned_data = preprocess_data(data)
11
- processed_data_path = 'data/processed/processed_dataset.csv'
12
- save_processed_data(cleaned_data, processed_data_path)
13
 
14
- # Proceed with model fine-tuning
15
- return fine_tune(model_name, dataset_url=None, file=processed_data_path, epochs=epochs, batch_size=batch_size, learning_rate=learning_rate)
16
-
17
- iface = gr.Interface(
18
- fn=prepare_and_train,
19
- inputs=[
20
- gr.Textbox(label="Model Name", placeholder="e.g., bert-base-uncased"),
21
- gr.File(label="Upload Dataset"),
22
- gr.Number(label="Epochs", value=3),
23
- gr.Number(label="Batch Size", value=8),
24
- gr.Number(label="Learning Rate", value=5e-5),
25
- ],
26
- outputs="text",
27
- live=True,
28
- )
 
29
 
30
  if __name__ == "__main__":
31
- iface.launch()
 
 
 
1
  import gradio as gr
2
+ from fine_tuner import fine_tune_model
3
+ from model_selector import get_model_list
4
+ from utils import load_dataset
5
 
6
+ def train_model(dataset_url, model_name, epochs, batch_size, learning_rate):
7
+ dataset = load_dataset(dataset_url)
8
+ metrics = fine_tune_model(dataset, model_name, epochs, batch_size, learning_rate)
9
+ return metrics
 
 
10
 
11
+ def main():
12
+ model_options = get_model_list()
13
+ interface = gr.Interface(
14
+ fn=train_model,
15
+ inputs=[
16
+ gr.inputs.Textbox(label="Dataset URL"),
17
+ gr.inputs.Dropdown(choices=model_options, label="Select Model"),
18
+ gr.inputs.Slider(minimum=1, maximum=10, default=3, label="Epochs"),
19
+ gr.inputs.Slider(minimum=1, maximum=64, default=16, label="Batch Size"),
20
+ gr.inputs.Slider(minimum=1e-5, maximum=1e-1, step=1e-5, default=1e-4, label="Learning Rate")
21
+ ],
22
+ outputs="json",
23
+ title="Transformers Fine Tuner",
24
+ description="Fine-tune pre-trained transformer models on custom datasets."
25
+ )
26
+ interface.launch()
27
 
28
  if __name__ == "__main__":
29
+ main()
fine_tuner.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
3
+ from datasets import load_dataset
4
+
5
+ def fine_tune_model(dataset, model_name, epochs, batch_size, learning_rate):
6
+ model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
7
+ training_args = TrainingArguments(
8
+ output_dir='./results',
9
+ num_train_epochs=epochs,
10
+ per_device_train_batch_size=batch_size,
11
+ learning_rate=learning_rate,
12
+ logging_dir='./logs',
13
+ logging_steps=10,
14
+ )
15
+ trainer = Trainer(
16
+ model=model,
17
+ args=training_args,
18
+ train_dataset=dataset['train'],
19
+ eval_dataset=dataset['validation'],
20
+ )
21
+ trainer.train()
22
+ return {"status": "Training complete"}
model_selector.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import HUGGINGFACE_HUB_NAME, HUGGINGFACE_HUB_MODEL
2
+
3
+ def get_model_list():
4
+ return [
5
+ "bert-base-uncased",
6
+ "distilbert-base-uncased",
7
+ "roberta-base",
8
+ "gpt2"
9
+ ]
requirements.txt CHANGED
@@ -1,7 +1,4 @@
1
- transformers
2
- torch
3
- datasets
4
- gradio
5
- accelerate
6
- bitsandbytes
7
- peft
 
1
+ transformers==4.30.0
2
+ gradio==3.1.0
3
+ torch==1.12.0
4
+ datasets==2.2.0
 
 
 
utils.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import pandas as pd
3
+ from datasets import load_dataset
4
+
5
+ def load_dataset(dataset_url):
6
+ if dataset_url.startswith("http"):
7
+ response = requests.get(dataset_url)
8
+ with open("temp_dataset.csv", "wb") as f:
9
+ f.write(response.content)
10
+ dataset = load_dataset("csv", data_files="temp_dataset.csv")
11
+ else:
12
+ dataset = load_dataset("csv", data_files=dataset_url)
13
+ return dataset