the "使用 Usage" does not work
where i test "使用 Usage" on colab, i get follow error
ModuleNotFoundError Traceback (most recent call last)
in
6 # 2. cd Fengshenbang-LM/fengshen/examples/pegasus/
7 # and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
----> 8 from tokenizers_pegasus import PegasusTokenizer
9
10 model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
/content/Fengshenbang-LM/fengshen/examples/pegasus/Fengshenbang-LM/fengshen/examples/pegasus/Fengshenbang-LM/fengshen/examples/pegasus/tokenizers_pegasus.py in
----> 1 from fengshen.examples.pegasus.data_utils import (
2 _is_control,
3 _is_punctuation,
4 _is_whitespace,
5 _is_chinese_char)
ModuleNotFoundError: No module named 'fengshen'
here is my code
!pip install transformers
!git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
%cd Fengshenbang-LM/fengshen/examples/pegasus/
%ls -l
from transformers import PegasusForConditionalGeneration
Need to download tokenizers_pegasus.py and other Python script from Fengshenbang-LM github repo in advance,
or you can download tokenizers_pegasus.py and data_utils.py in https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M/tree/main
Strongly recommend you git clone the Fengshenbang-LM repo:
1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
2. cd Fengshenbang-LM/fengshen/examples/pegasus/
and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
from tokenizers_pegasus import PegasusTokenizer
model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
inputs = tokenizer(text, max_length=1024, return_tensors="pt")
Generate Summary
summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]