Measuring Text Difficulty Using Parse-Tree Frequency

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Measuring Text Difficulty Using Parse-Tree Frequency相关的知识,希望对你有一定的参考价值。

?????????test   nec   sele   find   else   tps   cat   ??????   ceiling   

 https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf

 

In previous work, we conducted a preliminary corpus study of grammar frequency which showed that difficult texts use a wider variety of high-level grammatical structures (Kauchak et al., 2012). However, because of the large number of structural variations possible, no clear indication was found showing specific structures predominantly appearing in either easy or difficult documents.

In this work, we propose a much more fine-grained analysis. We propose a measure of text difficulty based on grammatical frequency and show how it can be used to identify sentences with difficult syntactic structures. In particular, the grammatical difficulty of a sentence is measured based on the frequency of occurrence of the top-level parse tree structure of the sentence in a large corpus

 

??????term familiarity?????????grammer familiarity????????????

Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree

??????:

???wiki?????????????????????parse tree?????????11 bins???????????????bin??????????????????

????????????bin????????????20?????????????????????????????????ter familiarity?????????????????????????????????????????????????????????5 points????????????????????????

??????????????????????????????????????????????????????3rd parse tree??????????????????????????????????????????????????????????????????

?????????

examine how grammatical frequency impacts the difficulty of a sentence and introduce a new measure of sentence-level text difficulty based on the grammatical structure of the sentence.

 

??????????????????perceived???actual

Our work here makes a step towards better simplification tools by 1) introducing a sentence-level, data-driven approach for measuring the grammatical difficulty of a sentence and 2) specifically measuring the impact of this measure using both how difficult a sentence looks (perceived difficulty) as well as how difficult a sentence is to understand (actual difficulty). 

 

??????????????????????????????????????????

simple texts use simpler words, fewer overall words and words that are more general (Coster & Kauchak, 2011; Napoles & Dredze, 2010; Zhu, Bernhard, & Gurevych, 2010). Certain types of words have also been found to be more prevalent in simpler texts including function words and verbs (Kauchak, Leroy, & Coster, 2012; Leroy & Endicott, 2011).

 

The Role of Syntax in Simplification

The syntax or grammar of a language dictates how words and phrases interact to form sentences???

splitting long sentences has been show to improve Cloze scores (Kandula, Curtis, & Zeng-Treitler, 2010) and additive and causal??????????????????connectors were easier to fill in than adversative or sequential connectors???????????????????????????????????? (Goldman & Murray, 1992). It has been suggested that grammatical difficulty is particularly important for L2 learners since they are still trying to learn appropriate grammatical structures for the language (Callan & Eskenazi, 2007; Clahsen & Felser, 2006).

?????????LOGICAL CONNECTORS  https://staff.washington.edu/marynell/grammar/logicalconnectors.html???

??????????????????????????????????????????function words????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????subject-verb-object versus object-subject-verb ordering????????????

 Some initial success has been achieved by automated simplification systems that perform syntactic transformations, ????????????????????????????????????????????????????????? 

 

????????????parse tree structure

We chose to focus on the 3rd level since it represents a compromise between generality and specificity.

45% of sentences in the corpus (2.47M) have unique 4th level parse tree structures, often because the 4th level regularly includes lexical components. ??????????????????????????????????????????????????????

To remove anomalous data and likely misparses, we ignored any structure that had only been seen once among the 5.4 millions sentences. After filtering, this results in 139,969 unique 3rd level structures. 

 

?????????

two sentences that have the same 3rd level structure, but that have varying frequency, ordered from most frequent to least. Because we focus on the high-level structure, the length of the sentences with the same structure also can vary widely

???2

grammatical frequency follows a Zipf – like distribution, with the most common structures occurring very frequently and many structures occurring infrequently

 

This approach for measuring the grammatical difficulty of text represents a generalized and datadriven approach that goes beyond specific, theory-based grammatical components of text difficult (e.g. active vs. passive voice, self-embedded clauses, etc. (Meyer & Rice, 1984)) and provides a generic framework for measuring grammatical difficulty.????????????????????????????????????

 

?????????

To minimize confounding factors that might influence sentence difficulty we control for sentence length and term familiarity

1???We ranked the 139,939 unique 3rd level structures and divided them into 11 frequency bins.?????????bin??????1%?????????structure????????????????????????10%

2???Each of the 5.4 million Wikipedia sentences can be mapped to one of the 11 frequency bins and we selected a subset of these for our study.

3???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

4?????????bin?????????20????????????grammer frequency???????????????term familiarity????????????

??????????????????????????????????????????????????????????????????3??????????????????bin?????????10???????????????????????????10?????????????????????

??????term familiarity??????????????????????????????Google web??????????????????????????????familiarity??????????????????familiarity??????3??????????????????bin?????????10?????????familiarity????????????????????????20????????????

--???

This process resulted in a sample of 220 sentences in 11 frequency bins with each bin containing 5 long sentences with high familiarity, 5 long with low familiarity, 5 short with high familiarity, and 5 short with low familiarity

 

For each of the 220 sentences, we recruited 30 participants for a total of N=6,600 samples. To ensure the quality and accuracy of the data, participants were restricted to be within the United States and to have a previous approval rating of 95%.

?????????MTurk is a crowdsourcing tool where requesters can upload tasks to be accomplished by a set of workers for a fee.

 

?????????

A paired-samples t–test showed our two control variables to be effective, with length significantly different between short and long sentences (t(10) = -60.47, p < 0.001) and word frequency significantly different between the high and low group (t(10) = -38.47, p < 0.001).

1??????????????????To measure actual difficulty (first dependent variable) we used a Cloze test. The basic Cloze test involves replacing every nth word in a text with a blank. Participants are then asked to fill in the blanks and are scored based on how many of their answers matched the original text (Taylor, 1953).

We employed a multiple-choice Cloze test. For each sentence, four nouns were randomly selected and replaced with blanks. For each sentence, we create five multiple-choice options containing the four removed words in different random orders, one of which is the correct ordering.

2???To measure perceived difficulty (second dependent variable), participants were asked to rate the sentences on a 5-point Likert scale with higher numbers representing more difficult sentences.

 

Each condition (11 x 2 x 2) had 5 sentences and for each sentence we gathered 30 responses, resulting in a dataset of N=6,600. T

 

An ANOVA shows these differences to be significant (F(10,6556)= 3.453, p < 0.001), for grammar frequency and sentence length, and (F(10,6556)= 1.870, p = 0.044), for grammar frequency and term familiarity. In addition, the interaction between all three variables is also significant (F(10, 6556) = 4.650, p < 0.001)

?????????????????????(Analysis of Variance?????????ANOVA)?????????“???????????????”???

1???????????????T???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

2????????????ANOVA???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????x????????????y???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

 

????????????structure??????~one-tailed Pearson correlation coefficient:

To complete this analysis and understand the strength of the effect on actual difficulty, we calculated a one-tailed Pearson correlation coefficient between the grammar frequency and the actual difficulty (percentage correct) for both the raw scores and scores aggregated by frequency bin. There was a negative correlation between grammar frequency and the actual difficulty of the sentence (raw scores: N = 6,600, r = -0.053, p < 0.01; bin averages: N = 11, r = -0.596, p < 0.05) indicating that sentences that used less frequent structures were harder to understand.

?????????structure frequency???????????????????????????????????????????????????

In contrast to actual difficulty, we also find a main effect of the sentence length on perceived difficulty with longer sentences seen as more difficult (averaged 2.2) than the shorter sentences (averaged 2.0). Surprisingly, there was no effect of the average term frequency on perceived difficulty.

 

The effect of grammar frequency on perceived difficulty is smaller in shorter sentences and those with lower term frequency

Both high and low frequency sentences show a jump in difficulty, though it occurs earlier (bin 7) for low frequency sentences than for high frequency sentences (bin 8)

 

we found a significant correlation between how well readers performed on the Cloze test and how difficult they thought a sentence was. Lower accuracy correlated with higher difficulty scores (N = 11, r = -0.574, p < 0.05; N = 6600, r = -0.203, p < 0.01)

 

Actual and perceived difficulty as measured in our user study for the 220 sentences binned by the Flesch-Kincaid grade level:

??????fk????????????220???????????????????????????????????????perceived difficulty???????????????

 

 

GRAMMAR FAMILIARITY AS AN ANALYSIS TOOL???

corpus???

Each of the texts were tokenized and split into sentences using the Stanford CoreNLP toolkit and then parsed using the Berkeley Parser (the same preprocessing as the frequency bins)

 

?????????

1?????????????????????????????????actual readability???perceived difficulty????????????

2????????????parse tree leve3 frequency???actual???perceived difficulty????????????????????????

3???????????????actual difficulty??????grammer????????????????????????shorter sentences are easy to understand and any effect of grammar is difficult to detect (ceiling effect)

Similarly, in sentences with low term familiarity (i.e. more difficult words) the grammar familiarity doesn’t impact text difficulty since users are struggling with the lexical difficulty

However, in sentences with very familiar terms, which are easier to understand, grammar frequency does have an impact on actual difficulty; only in sentences where the words are more familiar does the grammatical frequency have a strong effect. Interestingly, there was very little impact overall of term frequency on actual difficulty.

Based on these observations, we hypothesize that there is a relation between grammatical frequency and term frequency. Future studies are required to fully validate these hypotheses. Our study has limitations. Text comprehension was measured with individual

 

以上是关于Measuring Text Difficulty Using Parse-Tree Frequency的主要内容,如果未能解决你的问题,请参考以下文章

Measuring Cup Problem(量杯问题)

测量性能(Measuring Performance) ms

测量性能(Measuring Performance) ms

测量性能(Measuring Performance) ms

测量性能(Measuring Performance) ms

Halcon一维测量1D Measuring解析