Paper Reading - 基础系列 - 常用评价指标 ROC、PR、mAP

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Paper Reading - 基础系列 - 常用评价指标 ROC、PR、mAP相关的知识,希望对你有一定的参考价值。

参考技术A 【更多可看】

https://zhuanlan.zhihu.com/p/484973551

写了这么多突然发现忘了最重要的一部分,如何评估模型,这篇文章就好好捋一捋常见的一些评价指标

混淆矩阵 (Confusion Matrix)

以图为准, 准确率可以用混淆矩阵对角线之和除以总图片数量 来计算。对角线上的数字越大越好,以二分类猫狗为例,猫=1正样本,狗=0

True positives: 简称为 TP,即正样本被正确识别为正样本,猫图->猫。

True negatives: 简称为 TN,即负样本被正确识别为负样本,狗图->狗。

False Positives: 简称为 FP,即负样本被错误识别为正样本,狗图->猫。

False negatives: 简称为 FN,即正样本被错误识别为负样本,猫图->狗。

真正率:True Positive Rate(TPR)也称为灵敏度(Sensitivity)、召回率;

TPR = TP /(TP + FN)

真负率:True Negative Rate(TNR)也称为特指度(specificity);

TNR = TN /(TN + FP)

假正率:False Positive Rate (FPR);

FPR = FP /(FP + TN)

假负率:False Negative Rate(FNR);

FNR = FN /(TP + FN)

查准率/精确率 Precision

定义为: 预测为正的样本中有多少是真正的正样本 (针对预测结果)。宁可漏过,也不杀错人,只要刀出鞘杀的人一定是坏的

TP / (TP+FP)

召回率/检出率/查全率 Recall

定义为: 样本中的正例有多少被预测正确了 (针对原始样本) 宁可错杀一千,不可放过一个 pr两个参数都是互相平衡

TP / (TP+FN)

过杀率 Kill Rate

定义为:当原样本是正常=0,但检测结果是异常=1,则为过杀。 被过杀的样本数占测试集中的正常样本数的比例

FN / (TP+FN)

漏杀率 Miss Kill Rate

定义为:当原样本是异常=0,但检测结果是正常=1,则为漏杀。 被漏杀的样本数占测试集中的异常样本数的比例

FP / (FP+TN)

PR曲线、mAP(Mean-Average-Precision)

PR曲线与ROC类似,通过改变识别阈值,得到对应的点。

PR 曲线聚焦于正例。类别不平衡问题中由于主要关心正例,所以在此情况下 PR 曲线被广泛认为优于 ROC 曲线。

AP就是 Precision-recall 曲线下面的面积 ,通常来说一个越好的分类器,AP 值越高。 mAP 是多个类别 AP 的平均值。这个 mean 的意思是对每个类的 AP 再求平均(与ROC类似),得到的就是 mAP 的值,mAP 的大小一定在 [0,1] 区间,越大越好。该指标是目标检测算法中最重要的一个。

ROC曲线(r eceiver operating characteristic curve) 、AUC( Area under Curve )

ROC曲线从高到低,依次将  Score 的值(或者自定义的[0,1]一系列预置 )作为阈值 threshold,当测试样本属于正样本的概率大于或等于这个 threshold 时,我们认为它为正样本,否则为负样本。

每次选取一个不同的 threshold,我们就可以得到一组 FPR 和 TPR,即 ROC 曲线上的一点。 将这些点连接起来,就得到了 ROC 曲线。当 threshold 取值越多,ROC 曲线越平滑。曲线越接近左上角代表检测模型的效果越好。

当测试集中的正负样本的分布发生变化时,ROC 曲线可以保持不变。因为 TPR 聚焦于正例,FPR 聚焦于与负例。ROC成为一个比较均衡整体的评估方法

AOC则代表曲线下方蓝色的面积 ,可以用numpy中梯形积分方法进行计算。AUC 值本质上是一个概率值, 当我们随机挑选一个正样本以及一个负样本,算法计算得到正样本的 Score 值大于负样本概率 就是 AUC 值。AUC 值越大,当前的分类算法越有可能将正样本排在负样本前面,即能够更好的分类

同时可以使用约登指数。该方法的思想是找到横坐标与纵坐标差异最大的点,即是最佳阈值

fpr=FP /(FP + TN)

recall= TP /(TP + FN)

# 因为是负积分,所以要加个-

auc=-np.trapz(recall_col, fprs_col)

find_best_threshold(recall,fpr,np.range(0,1,0.1))

def find_best_threshold(TPR, FPR, threshold):

    y = TPR - FPR

    youden_index = np.argmax(y) 

    optimal_threshold = threshold[youden_index]

    point = [FPR[youden_index ], TPR[youden_index ]]

    return optimal_threshold, point

希望你没有被绕晕~

下面可以用code来举例

y = np.array([0, 1, 0, 1])  #实际值

scores = np.array([0.1, 0.6, 0.6, 0.5])  #预测值

取0.5为阈值,大于等于0.5为真值为1

则混淆矩阵为如图

Reading Comprehension必读paper汇总

文章目录

文章转自thunlp/RCPapers,另外在此基础上小博会实时更新~

Must-read papers on Machine Reading Comprehension.

Contributed by Yankai Lin, Deming Ye and Haozhe Ji.

Model Architecture

  1. Memory networks. Jason Weston, Sumit Chopra, and Antoine Bordes. arXiv preprint arXiv:1410.3916 (2014). paper ★ \\bigstar
  2. Teaching Machines to Read and Comprehend. Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. NIPS 2015. paper
  3. Text Understanding with the Attention Sum Reader Network. Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, and Jan Kleindienst. ACL 2016. paper
  4. A Thorough Examination of the Cnn/Daily Mail Reading Comprehension Task. Danqi Chen, Jason Bolton, and Christopher D. Manning. ACL 2016. paper
  5. Long Short-Term Memory-Networks for Machine Reading. Jianpeng Cheng, Li Dong, and Mirella Lapata. EMNLP 2016. paper
  6. Key-value Memory Networks for Directly Reading Documents. Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. EMNLP 2016. paper
  7. Modeling Human Reading with Neural Attention. Michael Hahn and Frank Keller. EMNLP 2016. paper
  8. Learning Recurrent Span Representations for Extractive Question Answering Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, and Jonathan Berant. arXiv preprint arXiv:1611.01436 (2016). paper
  9. Multi-Perspective Context Matching for Machine Comprehension. Zhiguo Wang, Haitao Mi, Wael Hamza, and Radu Florian. arXiv preprint arXiv:1612.04211. paper
  10. Natural Language Comprehension with the Epireader. Adam Trischler, Zheng Ye, Xingdi Yuan, and Kaheer Suleman. EMNLP 2016. paper
  11. Iterative Alternating Neural Attention for Machine Reading. Alessandro Sordoni, Philip Bachman, Adam Trischler, and Yoshua Bengio. arXiv preprint arXiv:1606.02245 (2016). paper
  12. Bidirectional Attention Flow for Machine Comprehension. Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. ICLR 2017. paper Reading Over
  13. Machine Comprehension Using Match-lstm and Answer Pointer. Shuohang Wang and Jing Jiang. arXiv preprint arXiv:1608.07905 (2016). paper Reading Over
  14. Gated Self-matching Networks for Reading Comprehension and Question Answering. Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. ACL 2017. paper
  15. Attention-over-attention Neural Networks for Reading Comprehension. Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, and Guoping Hu. ACL 2017. paper ★ \\bigstar
  16. Gated-attention Readers for Text Comprehension. Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. ACL 2017. paper
  17. A Constituent-Centric Neural Architecture for Reading Comprehension. Pengtao Xie and Eric Xing. ACL 2017. paper
  18. Structural Embedding of Syntactic Trees for Machine Comprehension. Rui Liu, Junjie Hu, Wei Wei, Zi Yang, and Eric Nyberg. EMNLP 2017. paper
  19. Accurate Supervised and Semi-Supervised Machine Reading for Long Documents. Izzeddin Gur, Daniel Hewlett, Alexandre Lacoste, and Llion Jones. EMNLP 2017. paper
  20. MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension. Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He. arXiv preprint arXiv:1707.09098 (2017). paper
  21. Dynamic Coattention Networks For Question Answering. Caiming Xiong, Victor Zhong, and Richard Socher. ICLR 2017 paper ★ \\bigstar
  22. R-NET: Machine Reading Comprehension with Self-matching Networks. Natural Language Computing Group, Microsoft Research Asia. paper Reading Over
  23. Reasonet: Learning to Stop Reading in Machine Comprehension. Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. KDD 2017. paper
  24. FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. ICLR 2018. paper
  25. Making Neural QA as Simple as Possible but not Simpler. Dirk Weissenborn, Georg Wiese, and Laura Seiffe. CoNLL 2017. paper
  26. Efficient and Robust Question Answering from Minimal Context over Documents. Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong. ACL 2018. paper
  27. Simple and Effective Multi-Paragraph Reading Comprehension. Christopher Clark and Matt Gardner. ACL 2018. paper
  28. Neural Speed Reading via Skim-RNN. Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. ICLR2018. paper
  29. Hierarchical Attention Flow forMultiple-Choice Reading Comprehension. Haichao Zhu, Furu Wei, Bing Qin, and Ting Liu. AAAI 2018. paper
  30. Towards Reading Comprehension for Long Documents. Yuanxing Zhang, Yangbin Zhang, Kaigui Bian, and Xiaoming Li. IJCAI 2018. paper
  31. Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension. Zhen Wang, Jiachen Liu, Xinyan Xiao, Yajuan Lyu, and Tian Wu. ACL 2018. paper
  32. Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification. Yizhong Wang, Kai Liu, Jing Liu, Wei He, Yajuan Lyu, Hua Wu, Sujian Li, and Haifeng Wang. ACL 2018. paper
  33. Reinforced Mnemonic Reader for Machine Reading Comprehension. Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, and Ming Zhou. IJCAI 2018. paper ★ \\bigstar
  34. Stochastic Answer Networks for Machine Reading Comprehension. Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. ACL 2018. paper ★ \\bigstar
  35. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. Wei Wang, Ming Yan, and Chen Wu. ACL 2018. paper
  36. A Multi-Stage Memory Augmented Neural Networkfor Machine Reading Comprehension. Seunghak Yu, Sathish Indurthi, Seohyun Back, and Haejun Lee. ACL 2018 workshop. paper
  37. S-NET: From Answer Extraction to Answer Generation for Machine Reading Comprehension. Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, and Ming Zhou. AAAI2018. paper Reading Over
  38. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning. Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, and Wei Wang. ICLR2018. paper
  39. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. ICLR2018. paper Reading Over
  40. Read + Verify: Machine Reading Comprehension with Unanswerable Questions. Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, and Ming Zhou. AAAI2019. paper ★ \\bigstar
  41. Attention-Guided Answer Distillation for Machine Reading Comprehension. Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou paper 新增 ★ \\bigstar
  42. Zero-shot relation extraction via reading comprehension.
  43. A nil-aware answer extraction framework for question answering.
  44. Simple and effective multiparagraph reading comprehension.
  45. The Natural Language Decathlon: Multitask Leaning as Question Answering

Utilizing Extenal Knolwedge

  1. Leveraging Knowledge Bases in LSTMs for Improving Machine Reading. Bishan Yang and Tom Mitchell. ACL 2017. paper
  2. Learned in Translation: Contextualized Word Vectors. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. arXiv preprint arXiv:1708.00107 (2017). paper
  3. Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge. Todor Mihaylov and Anette Frank. ACL 2018. paper
  4. A Comparative Study of Word Embeddings for Reading Comprehension. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. arXiv preprint arXiv:1703.00993 (2017). paper
  5. Deep contextualized word representations. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. NAACL 2018. paper
  6. Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. OpenAI. paper
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. arXiv preprint arXiv:1810.04805 (2018). paper

Exploration

  1. Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, and Percy Liang. EMNLP 2017. paper
  2. Did the Model Understand the Question? Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. ACL 2018. paper

Open Domain Question Answering

  1. Reading Wikipedia to Answer Open-Domain Questions. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. ACL 2017. paper
  2. R^3: Reinforced Reader-Ranker for Open-Domain Question Answering. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. AAAI 2018. paper
  3. Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. ICLR 2018. paper
  4. Denoising Distantly Supervised Open-Domain Question Answering. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. ACL 2018. paper

Datasets

  1. (SQuAD 1.0) SQuAD: 100,000+ Questions for Machine Comprehension of Text. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. EMNLP 2016. paper
  2. (SQuAD 2.0) Know What You Don’t Know: Unanswerable Questions for SQuAD.
    Pranav Rajpurkar, Robin Jia, and Percy Liang. ACL 2018. paper
  3. (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. arXiv preprint arXiv:1611.09268 (2016). paper
  4. (Quasar) Quasar: Datasets for Question Answering by Search and Reading. Bhuwan Dhingra, Kathryn Mazaitis, and William W. Cohen. arXiv preprint arXiv:1707.03904 (2017). paper
  5. (TriviaQA) TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer. arXiv preprint arXiv:1705.03551 (2017). paper
  6. (SearchQA) SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine.
    Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, and Kyunghyun Cho. arXiv preprint arXiv:1704.05179 (2017). paper
  7. (QuAC) QuAC : Question Answering in Context. Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. arXiv preprint arXiv:1808.07036 (2018). paper
  8. (CoQA) CoQA: A Conversational Question Answering Challenge. Siva Reddy, Danqi Chen, and Christopher D. Manning. arXiv preprint arXiv:1808.07042 (2018). paper
  9. (MCTest) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. EMNLP 2013. paper.
  10. (CNN/Daily Mail) Teaching Machines to Read and Comprehend. Hermann, Karl Moritz, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. NIPS 2015. paper
  11. (CBT) The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. arXiv preprint arXiv:1511.02301 (2015). paper
  12. (bAbi) Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, and Tomas Mikolov. arXiv preprint arXiv:1502.05698 (2015). paper
  13. (LAMBADA) The LAMBADA Dataset:Word Prediction Requiring a Broad Discourse Context. Denis Paperno, Germ ́an Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fern ́andez. ACL 2016. paper
  14. (SCT) LSDSem 2017 Shared Task: The Story Cloze Test. Nasrin Mostafazadeh, Michael Roth, Annie Louis,Nathanael Chambers, and James F. Allen. ACL 2017 workshop. paper
  15. (Who did What) Who did What: A Large-Scale Person-Centered Cloze Dataset Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. EMNLP 2016. paper
  16. (NewsQA) NewsQA: A Machine Comprehension Dataset. Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. arXiv preprint arXiv:1611.09830 (2016). paper
  17. (RACE) RACE: Large-scale ReAding Comprehension Dataset From Examinations. Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. EMNLP 2017. paper
  18. (ARC) Think you have Solved Question Answering?Try ARC, the AI2 Reasoning Challenge. Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot,Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. arXiv preprint arXiv:1803.05457 (2018). paper
  19. (MCScript) MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, and Manfred Pinkal. arXiv preprint arXiv:1803.05223. paper
  20. (NarrativeQA) The NarrativeQA Reading Comprehension Challenge.
    Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. TACL 2018. paper
  21. (DuoRC) DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension. Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, and Karthik Sankaranarayanan. ACL 2018. paper
  22. (CLOTH) Large-scale Cloze Test Dataset Created by Teachers. Qizhe Xie, Guokun Lai, Zihang Dai, and Eduard Hovy. EMNLP 2018. paper
  23. (DuReader) DuReader: a Chinese Machine Reading Comprehension Dataset from
    Real-world Applications.
    Wei He, Kai Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, and Haifeng Wang. ACL 2018 Workshop. paper Reading Over
  24. (CliCR) CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension. Simon Suster and Walter Daelemans. NAACL 2018. paper

other必读

  1. Semi-supervised Sequence Learning Andrew M. Dai and Quoc V. Le. @google NIPS 2015.半监督预训练
  2. Sequence to Sequence Learning with Neural Networks Ilya Sutskever. @google. NIPS 2014. 经常提的seq2seq
  3. [A Neural Probabilistic Language Model] 神经语言模型
  4. [Neural Reading Comprehension and Beyond] Stanford陈丹琦阅读理解综述
  5. MRQA 2018: Machine Reading for Question Answering
  6. https://mrqa2018.github.io/slides/phil.pdf

以上是关于Paper Reading - 基础系列 - 常用评价指标 ROC、PR、mAP的主要内容,如果未能解决你的问题,请参考以下文章

Paper Reading - 综述系列 - 计算机视觉领域中目标检测任务常见问题与解决方案

Reading Comprehension必读paper汇总

Paper ReadingLearning while Reading

Paper Reading 2:Human-level control through deep reinforcement learning

Paper Reading 1 - Playing Atari with Deep Reinforcement Learning

[paper reading] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detectio