斯坦福stanford coreNLP 宾州树库汉语短语类别表23个
Posted 一休Q_Q
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了斯坦福stanford coreNLP 宾州树库汉语短语类别表23个相关的知识,希望对你有一定的参考价值。
短语标记17个
标注 | 英文说明 | 中文说明 |
ADJP | Adjective phrase | 形容词短语,由JJ投射 |
ADVP | Adverbial phrase headed by AD | 由副词开头的副词短语、状语 |
CLP | Classifier phrase | 量词短语 |
CP | Clause headed by C(complementizer) | 由补语引导的补语从句,关系从句 |
DNP | Phrase formed by “XP+DEG” | XP+DEG结构构成的短语 |
DP | Determiner phrease | 限定词短语 |
DVP | Phrase formed BY ‘’XP+DEB“ | XP+DEV结构构成的短语 |
FRAG | fragment | 片段 |
IP | InflectionPhrase | Simple clause headed by I(INFL或其他曲折成份) |
LCP | Phrase formed by ”XP+LC“ | 处所词为中心语的短语 |
LST | List marker | 用于解释说明性的列表标记短语 |
NP | Noun phrase | 名词短语 |
PP | Preposition phrase | 介词短语 |
PRN | Parenthetical | 插入语 |
QP | Quantifier phrase | 数词短语,由数量词构成的短语结构 |
UCP | Unidentical coordination phrase | 非一致性并列短语 |
VP | Verb phrase | 动词短语 |
动词复合6个标记
VCD 并列动词复合 (VCD (VV 投资 ) (VV 办厂 ))VCP VV+VC 动词+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得) (AD 不) (VV 到))
VRD 动词结果复合,第二个成份是第一个成份的结果(VRD (VV 呈现) (VV 出));(VP(VRD(VV 联合) (VV 起来)))
VSB 定语+核心复合,第一个成份为不及物动词,两个成份之间没有附加语或者体标记,VSB (VV 加速) (VV 建设)) (VP(VSB(VV 仰头)(VV 望去)))
NP
中心词为名词构成的短语。从语法角度看,有两种含义:(1)按句法成份构成的短语,如组块在句子中充当主语、宾语等,可以增加辅助标签,NP-Sbg,NP-Obj;(2)知识库中的实体和属性,这种组块称为baseNP。
VP
以动词为中心,与其修饰、限定、并列成份共同构成的一种语义组块。
CoreNLP中源码
nonTerminalInfo.put("ROOT",new String[][]left, "IP");
nonTerminalInfo.put("PAIR",new String[][]left, "IP");
// Major syntactic categories
nonTerminalInfo.put("ADJP",new String[][]left, "JJ","ADJP"); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJP
nonTerminalInfo.put("ADVP",new String[][]left, "AD","CS", "ADVP","JJ"); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewrites
nonTerminalInfo.put("CLP",new String[][]right, "M","CLP");
//nonTerminalInfo.put("CP", newString[][] left,"WHNP","IP","CP","VP"); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.
nonTerminalInfo.put("CP",new String[][]right, "DEC","WHNP", "WHPP",rightExceptPunct); // the (syntax-oriented) right-first head rule
// nonTerminalInfo.put("CP", new String[][]right, "DEC","ADVP", "CP", "IP", "VP","M"); // the (syntax-oriented) right-first head rule
nonTerminalInfo.put("DNP",new String[][]right, "DEG","DEC", rightExceptPunct);//according to tgrep2, first preparation, all DNPs have a DEG daughter
nonTerminalInfo.put("DP",new String[][]left, "DT","DP"); // there's one instance of DP adjunction
nonTerminalInfo.put("DVP",new String[][]right, "DEV","DEC"); // DVP always has DEV under it
nonTerminalInfo.put("FRAG",new String[][]right, "VV","NN", rightExceptPunct);//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"
nonTerminalInfo.put("INTJ",new String[][]right, "INTJ","IJ", "SP");
nonTerminalInfo.put("IP",new String[][]left, "VP","IP", rightExceptPunct); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe head
nonTerminalInfo.put("LCP",new String[][]right, "LC","LCP"); // there's a bit of LCP adjunction
nonTerminalInfo.put("LST",new String[][]right, "CD","PU"); // covers all examples
nonTerminalInfo.put("NP",new String[][]right, "NN","NR", "NT","NP", "PN","CP"); // Basic heads are NN/NR/NT/NP; PN is pronoun. Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites. Finally, note that this doesn't give any specialtreatment of coordination.
nonTerminalInfo.put("PP",new String[][]left, "P","PP"); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2
// cdm 2006: PRN changed to not choose punctuation. Helped parsing (if not significantly)
// nonTerminalInfo.put("PRN", new String[][]left,"PU"); //presumably left/right doesn't matter
nonTerminalInfo.put("PRN",new String[][]left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP", rightdis, "NN","NR", "NT","FW");
// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-op
nonTerminalInfo.put("QP",new String[][]right, "QP","CLP", "CD","OD", "NP","NT", "M");//there's some QP adjunction
// add OD?
nonTerminalInfo.put("UCP",new String[][]left, ); //an alternative would be"PU","CC"
nonTerminalInfo.put("VP",new String[][]left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV",leftExceptPunct); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP
// add BA, LB, as needed
// verb compounds
nonTerminalInfo.put("VCD",new String[][]left, "VCD","VV", "VA","VC", "VE");//could easily be right instead
nonTerminalInfo.put("VCP",new String[][]left, "VCD","VV", "VA","VC", "VE");// notmuch info from documentation
nonTerminalInfo.put("VRD",new String[][]left, "VCD","VRD", "VV","VA", "VC","VE"); // definitely left
nonTerminalInfo.put("VSB",new String[][]right, "VCD","VSB", "VV","VA", "VC","VE"); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)
nonTerminalInfo.put("VNV",new String[][]left, "VV","VA", "VC","VE"); // left/right doesn't matter
nonTerminalInfo.put("VPT",new String[][]left, "VV","VA", "VC","VE"); // activity verb is to the left
// some POS tags apparently sit where phrases are supposed to be
nonTerminalInfo.put("CD",new String[][]right, "CD");
nonTerminalInfo.put("NN",new String[][]right, "NN");
nonTerminalInfo.put("NR",new String[][]right, "NR");
// I'm adding these POS tags to doprimitive morphology for character-level
// parsing. It shouldn't affect anythingelse because heads of preterminals are not
// generally queried - GMA
nonTerminalInfo.put("VV",new String[][]left);
nonTerminalInfo.put("VA",new String[][]left);
nonTerminalInfo.put("VC",new String[][]left);
nonTerminalInfo.put("VE",new String[][]left);
// new for ctb6.
nonTerminalInfo.put("FLR",new String[][]rightExceptPunct);
// new for CTB9
nonTerminalInfo.put("DFL",new String[][]rightExceptPunct);
nonTerminalInfo.put("EMO",new String[][]leftExceptPunct);//left/right doesn't matter
nonTerminalInfo.put("INC",new String[][]leftExceptPunct);
nonTerminalInfo.put("INTJ",new String[][]leftExceptPunct);
nonTerminalInfo.put("OTH",new String[][]leftExceptPunct);
nonTerminalInfo.put("SKIP",new String[][]leftExceptPunct);
以上是关于斯坦福stanford coreNLP 宾州树库汉语短语类别表23个的主要内容,如果未能解决你的问题,请参考以下文章
斯坦福大学Stanford coreNLP 宾州树库依存句法标注体系
斯坦福 stanford coreNLP 中的PCFG parser-lexparser