斯坦福stanford coreNLP 宾州树库汉语短语类别表23个

Posted 一休Q_Q

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了斯坦福stanford coreNLP 宾州树库汉语短语类别表23个相关的知识,希望对你有一定的参考价值。

短语标记17个

标注

英文说明

中文说明

ADJP

Adjective phrase

形容词短语,由JJ投射

ADVP

Adverbial phrase headed by AD

由副词开头的副词短语、状语

CLP

Classifier phrase

量词短语

CP

Clause headed by C(complementizer)

由补语引导的补语从句,关系从句

DNP

Phrase formed by “XP+DEG”

XP+DEG结构构成的短语

DP

Determiner phrease

限定词短语

DVP

Phrase formed BY ‘’XP+DEB“

XP+DEV结构构成的短语

FRAG

fragment

片段

IP

InflectionPhrase

Simple clause headed by I(INFL或其他曲折成份)

LCP

Phrase formed by ”XP+LC“

处所词为中心语的短语

LST

List marker

用于解释说明性的列表标记短语

NP

Noun phrase

名词短语

PP

Preposition phrase

介词短语

PRN

Parenthetical

插入语

QP

Quantifier phrase

数词短语,由数量词构成的短语结构

UCP

Unidentical coordination phrase

非一致性并列短语

VP

Verb phrase

动词短语



动词复合6个标记

VCD 并列动词复合 (VCD (VV 投资 )    (VV 办厂 ))
VCP VV+VC 动词+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得)   (AD 不)   (VV 到))
VRD 动词结果复合,第二个成份是第一个成份的结果(VRD (VV 呈现) (VV 出));(VP(VRD(VV 联合) (VV 起来)))
VSB 定语+核心复合,第一个成份为不及物动词,两个成份之间没有附加语或者体标记,VSB (VV 加速) (VV 建设)) (VP(VSB(VV 仰头)(VV 望去)))

NP

中心词为名词构成的短语。从语法角度看,有两种含义:(1)按句法成份构成的短语,如组块在句子中充当主语、宾语等,可以增加辅助标签,NP-Sbg,NP-Obj;(2)知识库中的实体和属性,这种组块称为baseNP。

VP

以动词为中心,与其修饰、限定、并列成份共同构成的一种语义组块。

 

CoreNLP中源码

nonTerminalInfo.put("ROOT",new String[][]left, "IP");
nonTerminalInfo.put("PAIR",new String[][]left, "IP");

// Major syntactic categories
nonTerminalInfo.put("ADJP",new String[][]left, "JJ","ADJP"); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJP
nonTerminalInfo.put("ADVP",new String[][]left, "AD","CS", "ADVP","JJ"); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewrites
nonTerminalInfo.put("CLP",new String[][]right, "M","CLP");
//nonTerminalInfo.put("CP", newString[][] left,"WHNP","IP","CP","VP"); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.
nonTerminalInfo.put("CP",new String[][]right, "DEC","WHNP", "WHPP",rightExceptPunct); // the (syntax-oriented) right-first head rule
// nonTerminalInfo.put("CP", new String[][]right, "DEC","ADVP", "CP", "IP", "VP","M"); // the (syntax-oriented) right-first head rule
nonTerminalInfo.put("DNP",new String[][]right, "DEG","DEC", rightExceptPunct);//according to tgrep2, first preparation, all DNPs have a DEG daughter
nonTerminalInfo.put("DP",new String[][]left, "DT","DP"); // there's one instance of DP adjunction
nonTerminalInfo.put("DVP",new String[][]right, "DEV","DEC"); // DVP always has DEV under it
nonTerminalInfo.put("FRAG",new String[][]right, "VV","NN", rightExceptPunct);//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"
nonTerminalInfo.put("INTJ",new String[][]right, "INTJ","IJ", "SP");
nonTerminalInfo.put("IP",new String[][]left, "VP","IP", rightExceptPunct); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe head
nonTerminalInfo.put("LCP",new String[][]right, "LC","LCP"); // there's a bit of LCP adjunction
nonTerminalInfo.put("LST",new String[][]right, "CD","PU"); // covers all examples
nonTerminalInfo.put("NP",new String[][]right, "NN","NR", "NT","NP", "PN","CP"); // Basic heads are NN/NR/NT/NP; PN is pronoun.  Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites.  Finally, note that this doesn't give any specialtreatment of coordination.
nonTerminalInfo.put("PP",new String[][]left, "P","PP"); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2
// cdm 2006: PRN changed to not choose punctuation.  Helped parsing (if not significantly)
// nonTerminalInfo.put("PRN", new String[][]left,"PU"); //presumably left/right doesn't matter
nonTerminalInfo.put("PRN",new String[][]left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP", rightdis, "NN","NR", "NT","FW");
// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-op
nonTerminalInfo.put("QP",new String[][]right, "QP","CLP", "CD","OD", "NP","NT", "M");//there's some QP adjunction
// add OD?
nonTerminalInfo.put("UCP",new String[][]left, ); //an alternative would be"PU","CC"
nonTerminalInfo.put("VP",new String[][]left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV",leftExceptPunct); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP
// add BA, LB, as needed

// verb compounds
nonTerminalInfo.put("VCD",new String[][]left, "VCD","VV", "VA","VC", "VE");//could easily be right instead
nonTerminalInfo.put("VCP",new String[][]left, "VCD","VV", "VA","VC", "VE");// notmuch info from documentation
nonTerminalInfo.put("VRD",new String[][]left, "VCD","VRD", "VV","VA", "VC","VE"); // definitely left
nonTerminalInfo.put("VSB",new String[][]right, "VCD","VSB", "VV","VA", "VC","VE"); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)
nonTerminalInfo.put("VNV",new String[][]left, "VV","VA", "VC","VE"); // left/right doesn't matter
nonTerminalInfo.put("VPT",new String[][]left, "VV","VA", "VC","VE"); // activity verb is to the left

// some POS tags apparently sit where phrases are supposed to be
nonTerminalInfo.put("CD",new String[][]right, "CD");
nonTerminalInfo.put("NN",new String[][]right, "NN");
nonTerminalInfo.put("NR",new String[][]right, "NR");

// I'm adding these POS tags to doprimitive morphology for character-level
// parsing.  It shouldn't affect anythingelse because heads of preterminals are not
// generally queried - GMA
nonTerminalInfo.put("VV",new String[][]left);
nonTerminalInfo.put("VA",new String[][]left);
nonTerminalInfo.put("VC",new String[][]left);
nonTerminalInfo.put("VE",new String[][]left);

// new for ctb6.
nonTerminalInfo.put("FLR",new String[][]rightExceptPunct);

// new for CTB9
nonTerminalInfo.put("DFL",new String[][]rightExceptPunct);
nonTerminalInfo.put("EMO",new String[][]leftExceptPunct);//left/right doesn't matter
nonTerminalInfo.put("INC",new String[][]leftExceptPunct);
nonTerminalInfo.put("INTJ",new String[][]leftExceptPunct);
nonTerminalInfo.put("OTH",new String[][]leftExceptPunct);
nonTerminalInfo.put("SKIP",new String[][]leftExceptPunct); 


以上是关于斯坦福stanford coreNLP 宾州树库汉语短语类别表23个的主要内容,如果未能解决你的问题,请参考以下文章

斯坦福大学Stanford coreNLP 宾州树库依存句法标注体系

斯坦福 stanford coreNLP 中的PCFG parser-lexparser

斯坦福 stanford coreNLP 中的PCFG parser-lexparser

句法分析语料:宾州树库UD树库

斯坦福 CoreNLP 不存在

开源中文分词工具探析:Stanford CoreNLP