2018-04-25-(GTF文件各列含义 )

2018-04-25  本文已影响8人  天秤座的机器狗

转载自 马疾香幽的博客

Fieldsmustbetab-separated. Also, all but the final field in each feature linemust contain a value; "empty" columns should be denoted with a'.'

seqname- name of the chromosomeor scaffold; chromosome names can be given with or without the'chr' prefix.Important note: the seqnamemust be one used within Ensembl, i.e. a standard chromosome name oran Ensembl identifier such as a scaffold ID, without any additionalcontent such as species or assembly. See the example GFF outputbelow.

source- name of the program thatgenerated this feature, or the data source (database or projectname)

feature- feature type name, e.g.Gene, Variation, Similarity

start- Start position of thefeature, with sequence numbering starting at 1.

end- End position of thefeature, with sequence numbering starting at 1.

score- A floating pointvalue.

strand- defined as + (forward)or - (reverse).

frame- One of '0', '1' or '2'.'0' indicates that the first base of the feature is the first baseof a codon, '1' that the second base is the first base of a codon,and so on..

attribute- A semicolon-separatedlist of tag-value pairs, providing additional information abouteach feature.

1.染色体名

2.注释信息的来源,比如”Genescan”、”Genbank”

等,可以为空,为空用”.”点号代替

3.注释信息的类型,比如Gene、cDNA、mRNA等,或者是SO对应的编号

4、5.开始和结束位置

6.得分,数字,是注释信息可能性的说明,可以是序列相似性比对时的E-values值或者基因预测是的P-values值。”.”表示为空。

7.序列的方向,

+表示正义链, -反义链 , ? 表示未知

8.阅读框:有数字0、1和2。0代表序列的第一个碱基为密码子的第一个碱基,1代表是密码子第二个,2代表第三个。

9.以多个键值对组成的注释信息描述,键与值之间用”=“,不同的键值用”;“隔开,一个键可以有多个值,不同值用”,“分割。注意如果描述中包括tab键以及”,=;”,要用URL转义规则进行转义,如tab键用

代替。键是区分大小写的,以大写字母开头的键是预先定义好的,在后面可能被其他注释信息所调用

上一篇下一篇

猜你喜欢

热点阅读