1Question: prep_reads.info vs. align_summary.txt

Posted 风中之铃

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了1Question: prep_reads.info vs. align_summary.txt相关的知识,希望对你有一定的参考价值。

###参考:https://www.biostars.org/p/163356/ 

used TopHat to map my reads against their relative reference genome.


When I look inside prep_reads.info, I see:

  • left_min_read_len=90
  • left_max_read_len=90
  • left_reads_in =24995053
  • left_reads_out=24994132
  • right_min_read_len=90
  • right_max_read_len=90
  • right_reads_in =24995053
  • right_reads_out=24994422

Then when I open align_summary.txt, I see:

Left reads:
               Input:  24995053
             Mapped:  22715900 (90.9% of input)
            of these:   2106892 ( 9.3%) have multiple alignments (89 have >20)
Right reads:
               Input:  24995053
              Mapped:  22310498 (89.3% of input)
            of these:   2088630 ( 9.4%) have multiple alignments (148 have >20)
90.1% overall read alignment rate.

Aligned pairs:  21074559
     of these:   1469415 ( 7.0%) have multiple alignments
          and:    107380 ( 0.5%) are discordant alignments
83.9% concordant pair alignment rate.


In align_summary.txt I know the changes between "Input" number and "Mapped" is because some of reads are unmapped to reference genome. ^Ok^.

But for prep_reads.info I do not know why "_reads_out" numbers are different from "_reads_in" numbers and If this difference is due to unmapped reads, why the difference is not equal to difference between the Input number and Mapped number in align_summary.txt?

<caption>Differences</caption>
 prep_reads.infoalign_summary.txt
left 24995053-24994132=921 24995053-22715900=2279153

right

24995053-24994422=631

24995053-22310498=2684555



The difference is due to filtering for things such as read length. Some reads are too short, so they‘re excluded. This occurs before any mapping takes place. 

        I seeeeeee. I did not know thaaat. I thought we can eliminate short reads only by trimmomatic (MINLEN). I did not know mapping tools also eliminate some reads.

 

Well, "things such as read length". It‘s filtering for other things too. In your case, one of these "other things" is what‘s causing additional reads to get dropped, since your input is all 90 bases

以上是关于1Question: prep_reads.info vs. align_summary.txt的主要内容,如果未能解决你的问题,请参考以下文章

Complete Binary Search Tree

用于问卷分析的数据透视表

Excel,自定义自动填充功能

第3章-动态基础分析实验

尝试通过测验系统获得良好的订单

PHP Array - 获取所有子值[重复]