作者:zhouwenjun | 来源:互联网 | 2022-12-10 13:12
我正在尝试创建自定义块标记并从中提取关系.以下是将我带到级联块树的代码.
grammar = r"""
NPH: {
+} # Chunk sequences of DT, JJ, NN
PPH: {} # Chunk prepositions followed by NP
VPH: {+$} # Chunk verbs and their arguments
CLAUSE: {} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
输出 -
(S(NPH Mary/NN)锯/ VBD(NPH/DT cat/NN)坐/ VB on/IN(NPH/DT垫/ NN))
现在我尝试使用nltk.sem.extract_rels函数提取NPH标记值与其间的文本之间的关系,但它似乎仅适用于使用ne_chunk函数生成的命名实体.
IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
print(nltk.sem.rtuple(rel))
这会出现以下错误 -
ValueError:尚未识别您的主题类型的值:NPH
有没有一种简单的方法只使用块标签来创建关系,因为我真的不想重新训练NER模型来检测我的块标签作为相应的命名实体
谢谢!
1> 小智..:
extract_rels
(doc)检查参数subjclass
并且objclass
是已知的NE标签,因此错误NPH
.
简单,特别的方法是重写自定义extract_rels
函数(下面的示例).
import nltk
import re
grammar = r"""
NPH: {
- +} # Chunk sequences of DT, JJ, NN
PPH: {} # Chunk prepositions followed by NP
VPH: {+$} # Chunk verbs and their arguments
CLAUSE: {} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
IN = re.compile(r'.*\bon\b')
def extract_rels(subjclass, objclass, chunked, pattern):
# padding because this function checks right context
pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]]
reldicts = nltk.sem.relextract.semi_rel2reldict(pairs)
relfilter = lambda x: (x['subjclass'] == subjclass and
pattern.match(x['filler']) and
x['objclass'] == objclass)
return list(filter(relfilter, reldicts))
for e in extract_rels('NPH', 'NPH', chunked, pattern=IN):
print(nltk.sem.rtuple(e))
输出:
[NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']