从fastq文件中抽取序列(Seqtk)
2019-02-02 本文已影响0人
nitrostarch
宏基因组的数据文件很大,难以用个人电脑完成分析,用Seqtk软抽取少量序列进行分析,以达到管中窥豹的目的
安装
cd /home/llt/software
git clone https://github.com/lh3/seqtk.git
cd seqtk
make
使用
抽取1000万条序列。
mkdir /home/llt/experiment/data/clean/subsamble_10m
cd /home/llt/experiment/data/clean/subsamble_10m
/home/llt/software/seqtk/seqtk sample -s 100 /mnt/d/BaiduYunDownload/MJ_cleandata/SS_G1.fastp.1.fq 10000000 > ssg1_10m.1.fq
/home/llt/software/seqtk/seqtk sample -s 100 /mnt/d/BaiduYunDownload/MJ_cleandata/SS_G1.fastp.2.fq 10000000 > ssg1_10m.2.fq
根据序列ID提取fasta序列
seqtk subseq rep_set.fna 001name_list.txt > otu001.fasta