Crack4-详解根据基因组测序报告,进行细菌基因组Genome
2021-05-23 本文已影响0人
RashidinAbdu
背景:
- 将测序得到的细菌基因组数据上传NCBI前,需要计算
基因组覆盖度
,而这个可以根据基因组测序报告来进行计算
1. Concept:
To calculate the genome coverage, divide the number of bases sequenced by the estimated genome size, multiplied by % reads placed in contigs, as the following example:
1,514,603,088 / 2,100,000 x (96% of reads placed) = 692x
2. How to calculate:
- 为此直接写了个Python程序,今后只需要将以下三个变量:
number_of_bases_sequenced
,estimated_genome_size
,reads_placed_in_contigs
值放进去,点击运行即可得到基因组覆盖度!
number_of_bases_sequenced =1377024000
estimated_genome_size= 4109798
reads_placed_in_contigs= (1317332892/1377024000)
genome_coverage="{:.2f}".format((number_of_bases_sequenced/estimated_genome_size)*reads_placed_in_contigs)#print 2 decimal places
#format_float = "{:.2f}".format(genome_coverage)
#print(format_float)
print("%reads_placed_in_contigs=", "{:.2f}".format(reads_placed_in_contigs*100), "%") #print 2 decimal places
# 最终获得的基因组覆盖度
print("genome_coverage=", genome_coverage)
就得到:
image.png
3. 那么问题来了,如何找到基因组报告里对应的值?
具体如下:
image.png image.png
image.png
所以根据这个进行计算:
#To calculate the genome coverage, divide the number of bases sequenced by the estimated genome size,
# multiplied by % reads placed in contigs
# 如: 1,514,603,088 / 2,100,000 x (96% of reads placed) = 692x
number_of_bases_sequenced =1377024000
estimated_genome_size= 4109798
reads_placed_in_contigs= (1317332892/1377024000)
genome_coverage="{:.2f}".format((number_of_bases_sequenced/estimated_genome_size)*reads_placed_in_contigs)#print 2 decimal places
#format_float = "{:.2f}".format(genome_coverage)
#print(format_float)
print("%reads_placed_in_contigs=", "{:.2f}".format(reads_placed_in_contigs*100), "%") #print 2 decimal places
print("genome_coverage=", genome_coverage, "x")
-
即最终的 Genome Coverage: 320.53x