使用finetune_ner.py训练时使用的是CPU

whale · 2024年01月22日 08:03

系统为win10，cuda版本12.3，python 3.11，使用仓库中的setup.py创建虚拟环境
运行finetune_ner.py时，控制台输出使用的是CPU
请问如何使用GPU训练，应该安装哪些包的哪些版本

下面是fintune_ner.py的代码

import hanlp
from hanlp.components.ner.transformer_ner import TransformerNamedEntityRecognizer
from tests import cdroot

cdroot()

your_training_corpus = 'finetune_corpus/word_level.train.tsv'
your_development_corpus = 'finetune_corpus/word_level.dev.tsv'  # Use a different one in reality
save_dir = 'data/ner/finetune/model'

if not os.path.exists(your_training_corpus):
    os.makedirs(os.path.dirname(your_training_corpus), exist_ok=True)
    with open(your_training_corpus, 'w') as out:
        out.write(
'''训练\tB-NLP
语料\tE-NLP
为\tO
IOBES\tO
格式\tO
'''
        )
    

ner = TransformerNamedEntityRecognizer()
ner.fit(
    trn_data=your_training_corpus,
    dev_data=your_development_corpus,
    save_dir=save_dir,
    epochs=50,  # Since the corpus is small, overfit it
    finetune=hanlp.pretrained.ner.MSRA_NER_ELECTRA_SMALL_ZH,
    # You MUST set the same parameters with the fine-tuning model:
    average_subwords=True,
    transformer='hfl/chinese-electra-180g-small-discriminator',
)

HanLP = hanlp.pipeline()\
    .append(hanlp.load(hanlp.pretrained.tok.FINE_ELECTRA_SMALL_ZH), output_key='tok')\
    .append(ner, output_key='ner')
HanLP(['训练语料为IOBES格式', '晓美焰来到北京立方庭参观自然语义科技公司。']).pretty_print()

我是希望在hanlp ner原有的基础上可以识别新的实体ALLUSION，那么请问我在训练语料中只需要标注这一个实体还是说其他的实体也要标注上（例如PERSON,LOCATION, DATE…）

第一次接触这些内容，了解实在不多，请大家指导一下