USEARCH的使用

2020-03-24  本文已影响0人  佳名

USEARCH的下载

https://drive5.com/cgi-bin/upload3.py?license=2020032323051700172
下载后放入环境变量

rdp下载

https://drive5.com/usearch/manual/sintax_downloads.html

解压序列文件

gzip -d *.gz

命名为

ren *_1.fastq *_R1.fq
ren *_2.fastq *_R2.fq

批量拼接双端序列

ls *_R1.fq|while read id;
do
usearch11.0.667_win32.exe -fastq_mergepairs $id -relabel @ -fastq_maxdiffs 10 -fastq_pctid 80 -fastqout ${id%%_*}.fq;
done

maxdiffs:最大不匹配数
fastq_pctid:最小对齐百分比

过滤,去除错误碱基

ls *.fq|while read id;
do
usearch11.0.667_win32.exe -fastq_filter $id -fastq_maxee 1.0 -fastaout ${id%%.*}.fa
done

maxee:最大预期错误

合并文件

cat *.fa > sample.fa

查找唯一序列(去除复制)添加大小注释

usearch11.0.667_win32.exe -fastx_uniques sample.fa -fastaout uniques.fa --sizeout --relabel Uniq

输出uniques.fa

转化为OTU,做表

usearch11.0.667_win32.exe --cluster_otus uniques.fa -otus otus.fa -relabel Otu
usearch11.0.667_win32.exe -otutab sample.fa -otus otus.fa -otutabout otubab.txt

上面第二步耗时较长,

#00:00 5.6Mb   100.0% Reading otus.fa
#00:00 5.5Mb   100.0% Masking (fastnucleo)
#00:00 6.4Mb   100.0% Word stats
#00:00 6.4Mb   100.0% Alloc rows
#00:00 6.7Mb   100.0% Build index
#01:23 48Mb    100.0% Searching, 66.4% matched
#123497 / 185971 mapped to OTUs (66.4%)
#01:23 48Mb   Writing otubab.txt
#01:23 48Mb   Writing otubab.txt ...done.

物种预测

rdp_16s_v16.fa转换格式
usearch11.0.667_win32.exe -makeudb_usearch rdp_16s_v16.fa -output rdp_16s.udb

采用sintax算法,阈值设置为0.8

usearch11.0.667_win32.exe -sintax otus.fa -db rdp_16s.udb -tabbedout otu.sintax -strand both -sintax_cutoff 0.8
usearch11.0.667_win32.exe -calc_distmx otus.fa -tabbedout mx.txt -maxdist 0.2 -termdist 0.3
usearch11.0.667_win32.exe -cluster_aggd mx.txt -treeout clusters.tree -clusterout clusters.txt -id 0.80 -linkage min
上一篇下一篇

猜你喜欢

热点阅读