HanLP 模块中没有 load 方法

zuoshou1 · 2024年04月8日 06:17

运行代码提示如下：

C:\Users\Administrator\PycharmProjects\pythonProject\main.py
Traceback (most recent call last):
File “C:\Users\Administrator\PycharmProjects\pythonProject\main.py”, line 16, in
semantic_model = HanLP.load(decoded_key)
File “C:\ProgramData\Miniconda3\lib\site-packages\pyhanlp_init_.py”, line 170, in getattr
return getattr(self._proxy, attr)
File “C:\ProgramData\Miniconda3\lib\site-packages\jpype_jclass.py”, line 143, in getattribute
attr = type.getattribute(self, name)
AttributeError: type object ‘com.hankcs.hanlp.HanLP’ has no attribute ‘load’

完整代码是：

import pandas as pd
from pyhanlp import HanLP
from sklearn.feature_extraction.text import TfidfVectorizer
import re
import base64

读取 Excel 文件

file_path = r’C:\Users\Administrator\Desktop\中国金融市场栏目.xlsx’
df = pd.read_excel(file_path)

解码秘钥

（略）

加载 HanLP 抽象语义表示模型

HanLP.Config.CoreModelName = decoded_key

定义文本清洗和语义角色标注函数

def clean_and_semantic_role_labeling(text):
if pd.isnull(text): # 如果是空值，返回空字符串
return ‘’
# 去除特殊字符和标点符号
text = re.sub(r’[^\w\s]’, ‘’, str(text)) # 将 text 转换为字符串
# 去除数字
text = re.sub(r’\d’, ‘’, text)
# 转换为小写
text = text.lower()
# 语义角色标注
result = HanLP.parseDependency(text).toString()
return result

对第四列进行文本清洗和语义角色标注

df[‘文本清洗和语义角色标注结果’] = df[‘正文内容’].apply(clean_and_semantic_role_labeling)

使用TfidfVectorizer构建文本特征

vectorizer = TfidfVectorizer(max_features=1000)
X_tfidf = vectorizer.fit_transform(df[‘文本清洗和语义角色标注结果’])

将稀疏矩阵转换为DataFrame

X_tfidf_df = pd.DataFrame(X_tfidf.toarray(), columns=vectorizer.get_feature_names_out())

合并文本特征和原始数据

df = pd.concat([df, X_tfidf_df], axis=1)

保存修改后的 DataFrame 到原始 Excel 文件

df.to_excel(file_path, index=False)

print(f"文本清洗、语义角色标注结果以及TF-IDF文本特征已保存到原始文件 {file_path}")

问题是： HanLP 模块中没有 load 方法，没有加载抽象语义表示模型。我通过vpi调用，运行代码后，提示’ has no attribute ‘load’