判断二代测序数据产自哪种illumina测序平台

2020-12-04  本文已影响0人  wo_monic

https://raw.githubusercontent.com/10XGenomics/supernova/master/tenkit/lib/python/tenkit/illumina_instrument.py
最新分类情况,请在上述链接查找。

首字符 测序平台
HWI-M [0-9] {4} $ MiSeq
HWUSI Genome Analyzer IIx
“ M [0-9] {5} $ MiSeq
“ HWI-C [0-9] {5} $ HiSeq 1500
“ C [0-9] {5} $ HiSeq 1500
“ HWI-D [0-9] {5} $ HiSeq 2500
“ D [0-9] {5} $ HiSeq 2500
“ J [0-9] {5} $ HiSeq 3000
“ K [0-9] {5} $ HiSeq 3000(目前基本不用),HiSeq 4000
“ E [0-9] {5} $ HiSeq X
NB [0-9] {6} $ NextSeq
NS [0-9] {6} $ NextSeq
MN [0-9] {5} $ MiniSeq
测序通道的分类
         "C[A-Z,0-9]{4}ANXX$" : (["HiSeq 1500", "HiSeq 2000", "HiSeq 2500"], "High Output (8-lane) v4 flow cell"),
         "C[A-Z,0-9]{4}ACXX$" : (["HiSeq 1000", "HiSeq 1500", "HiSeq 2000", "HiSeq 2500"], "High Output (8-lane) v3 flow cell"),
         "H[A-Z,0-9]{4}ADXX$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}BCXX$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v2 flow cell"),
         "H[A-Z,0-9]{4}BCXY$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v2 flow cell"),
         "H[A-Z,0-9]{4}BBXX$" : (["HiSeq 4000"], "(8-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}BBXY$" : (["HiSeq 4000"], "(8-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}CCXX$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}CCXY$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}ALXX$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}BGXX$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}BGXY$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}BGX2$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}AFXX$" : (["NextSeq"], "Mid output flow cell"),
         "A[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq flow cell"),
         "B[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq flow cell"),
         "D[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq nano flow cell"),
         "G[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq micro flow cell"),
         "H[A-Z,0-9]{4}DMXX$" : (["NovaSeq"], "S2 flow cell")}

使用zless查看测序原始文件。
zless sample.fastq.gz|head -5

@E00552:40:H23NGCCXY:5:1101:1154:1520 1:N:0:NCAGTG
NTTTGCTAAACGGAAGGACTAAAGTAGGAACTGATTGGCTTTAGTCTCTAGTCTCTCACATGGGTGCTAAAAGGGACTAGAGGGTAACATTTACTCCAATTGCCTTTGCCTAGAGTTGGAATATAATATAAGTGAATTGTCCACCTTCTT
+
#AAFAFJAJJ-FFFJJJ7JJJFJJJJJFJJJJ<FFFAJJJJFJJJJJJJJJJFAJ<AJJFJJJJ-FF7FJJJJJJJJF<FJJJJAFAJFFFJJJJJJJFJ-FJJJJFJ<J-FJFF-7AF7FJF7FJJ7FAFJ-<<7<-AAJJJ<JA-F<-
@E00552:40:H23NGCCXY:5:1101:2777:1520 1:N:0:NCAGTG

显然可以看出,是E开头,即HiSeq X (8-lane) flow cell

例2:zless sample2.fastq.gz|head -5

@A00262:358:HTG2NDSXX:2:1101:1127:1031 1:N:0:GTTATA+GTTATAC
GNCTACATTTACCTAGCATTTTTCTTCTATCTTACATAGTTTTTGGGTAAACATACTATCCTTATGAGCATTGGGTGTAATGTTTGTTGTTTTATGTTGATTGCTTATTTGGGTAGAAATGACTAACCTATGCTTCATTCCTGCGGATGG
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFF,FFFF,F:FFF:FFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00262:358:HTG2NDSXX:2:1101:1181:1031 1:N:0:GTTATA+GTTATAC
上一篇 下一篇

猜你喜欢

热点阅读