测序数据分析metagenomic

从fastq文件中抽取序列(Seqtk)

2019-02-02  本文已影响0人  nitrostarch

宏基因组的数据文件很大,难以用个人电脑完成分析,用Seqtk软抽取少量序列进行分析,以达到管中窥豹的目的

安装

cd /home/llt/software
git clone https://github.com/lh3/seqtk.git
cd seqtk
make

使用

抽取1000万条序列。

mkdir /home/llt/experiment/data/clean/subsamble_10m
cd /home/llt/experiment/data/clean/subsamble_10m
/home/llt/software/seqtk/seqtk sample -s 100 /mnt/d/BaiduYunDownload/MJ_cleandata/SS_G1.fastp.1.fq 10000000 > ssg1_10m.1.fq
/home/llt/software/seqtk/seqtk sample -s 100 /mnt/d/BaiduYunDownload/MJ_cleandata/SS_G1.fastp.2.fq 10000000 > ssg1_10m.2.fq

根据序列ID提取fasta序列

 seqtk subseq rep_set.fna  001name_list.txt > otu001.fasta
上一篇 下一篇

猜你喜欢

热点阅读