the "使用 Usage" does not work

#2
by candog - opened

where i test "使用 Usage" on colab, i get follow error

ModuleNotFoundError Traceback (most recent call last)
in
6 # 2. cd Fengshenbang-LM/fengshen/examples/pegasus/
7 # and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
----> 8 from tokenizers_pegasus import PegasusTokenizer
9
10 model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")

/content/Fengshenbang-LM/fengshen/examples/pegasus/Fengshenbang-LM/fengshen/examples/pegasus/Fengshenbang-LM/fengshen/examples/pegasus/tokenizers_pegasus.py in
----> 1 from fengshen.examples.pegasus.data_utils import (
2 _is_control,
3 _is_punctuation,
4 _is_whitespace,
5 _is_chinese_char)

ModuleNotFoundError: No module named 'fengshen'

here is my code
!pip install transformers
!git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
%cd Fengshenbang-LM/fengshen/examples/pegasus/
%ls -l

from transformers import PegasusForConditionalGeneration

Need to download tokenizers_pegasus.py and other Python script from Fengshenbang-LM github repo in advance,

or you can download tokenizers_pegasus.py and data_utils.py in https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M/tree/main

Strongly recommend you git clone the Fengshenbang-LM repo:

1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM

2. cd Fengshenbang-LM/fengshen/examples/pegasus/

and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model

from tokenizers_pegasus import PegasusTokenizer

model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")

text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
inputs = tokenizer(text, max_length=1024, return_tensors="pt")

Generate Summary

summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

model Output: 反垄断调查小组突击查访奔驰上海办事处,对多名奔驰高管进行约谈

Sign up or log in to comment