YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BPE Tokenizer for Nepali LLM

This repository contains a Byte Pair Encoding (BPE) tokenizer trained using the Hugging Face transformers package on the Nepali LLM dataset. The tokenizer has been optimized for handling Nepali text and is intended for use in language modeling and other natural language processing tasks.

Overview

  • Tokenizer Type: Byte Pair Encoding (BPE)
  • Vocabulary Size: 50,000
  • Dataset Used: Nepali LLM Datasets

Installation

To use the tokenizer, you need to install the transformers library. You can install it via pip:

pip install transformers

Usage

You can easily load the tokenizer using the following code:

from transformers import PreTrainedTokenizerFast

# Load the tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/NepaliBPE")

# Example usage
text = "तपाईंलाई कस्तो छ?"
tokens = tokenizer.encode(text)
print("Tokens:", tokens)
print("Decoded:", tokenizer.decode(tokens))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .