无法从spacy中的huggingface模型回购初始化模型

2024-09-27 00:11:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个关于ner的项目,我想使用spacy for ner的pipline组件和从变压器中预先训练的模型生成的字向量。我使用spacy的spacy transformer并跟随他们的帮会,但它不起作用。
我正在使用spacy-2.3.5、transformer-0.6.2、python-2.3.5并尝试在colab中运行它。
空间变压器GitHub的链接:Link to git hub
链接到转换器中的模型:Link to vinai/phobert-base
转换格式中的模型名称:vinai/phobert-base
我有一个问题:我们是否可以通过spacy transformer在变压器中使用任何列车前模型,或者只是某种模型

在他们的公会中,在spacy中加载预先训练好的模型之前,我们需要初始化它here their guild

! export CUDA_PATH="/opt/nvidia/cuda"
! pip install -U spacy[cuda101]
! pip install spacy-transformers
! git clone -b v0.6.x https://github.com/explosion/spacy-transformers
! python /content/spacy-transformers/examples/init_model.py -n "vinai/phobert-base" \
-l vi vi_vinai_phobert_base

我收到一份日志:

2020-12-25 03:46:18.163785: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
ℹ Creating model for 'vinai/phobert-base' (vi)
Downloading: 100% 557/557 [00:00<00:00, 689kB/s]
⠼ Setting up the pipeline...
Traceback (most recent call last):
  File "/content/spacy-transformers/examples/init_model.py", line 32, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.6/dist-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.6/dist-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/content/spacy-transformers/examples/init_model.py", line 19, in main
    nlp.add_pipe(TransformersWordPiecer.from_pretrained(nlp.vocab, name))
  File "/usr/local/lib/python3.6/dist-packages/spacy_transformers/pipeline/wordpiecer.py", line 26, in from_pretrained
    model = get_tokenizer(trf_name).from_pretrained(trf_name)
  File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 393, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 496, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'vinai/phobert-base' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed 'vinai/phobert-base' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

有人能帮我或给我建议吗?非常感谢你


Tags: infrompy模型basemodelspacyusr

热门问题