一起生信啦啦啦

charseq质控

2018-11-13  本文已影响4人  苏牧传媒

genome:

ChAR-seq1_R1.fq

ChAR-seq1_R2.fq

1.双端进行trimmomatic -> fq

trimmomatic PE -threads 20 -phred33 ChAR-seq1_R1.fq ChAR-seq1_R2.fq sample.R1.clean.fq sample.R1.unpaired.fq sample.R2.clean.fq sample.R2.unpaired.fq ILLUMINACLIP:/home/pc/miniconda3/share/trimmomatic-0.38-0/adapters/TruSeq3-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 AVGQUAL:20 MINLEN:50

[R2端转为R1]

seqkit seq sample.R2.clean.fq -r -p > sample.R2.fine.fq

2.单端进行split -> DNA and RNA fq

gzip sample.R1.clean.fq &&

gzip sample.R2.fine.fq &&

python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R1.clean.fq.gz   --NAME sample.R1 --minRNA 18 --minDNA 18 &&

python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R2.fine.fq.gz --NAME sample.R1 --minRNA 18 --minDNA 18

3.DNA and RNA 分别mapping -> bam 

gndx=/media/pc/disk1/sun/refdata/gencode_GRCh37/03_bowtie2_index/GRCh37

bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.dna.bam &&

bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.rna.bam &&

bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.dna.bam &&

bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.rna.bam

4.过滤掉MAPQ<20 -> filtered.bam

samtools view -b -q 20 sample.R1.dna.bam > sample.R1.dna.filtered.bam &&

samtools view -b -q 20 sample.R1.rna.bam > sample.R1.rna.filtered.bam &&

samtools view -b -q 20 sample.R2.dna.bam > sample.R2.dna.filtered.bam &&

samtools view -b -q 20 sample.R2.rna.bam > sample.R2.rna.filtered.bam

5.去重 -> clean.bam

java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.dna.filtered.bam O=sample.R1.dna.clean.bam M=picard.sample.R1.dna.txt &&

java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.rna.filtered.bam O=sample.R1.rna.clean.bam M=picard.sample.R1.rna.txt &&

java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.dna.filtered.bam O=sample.R2.dna.clean.bam M=picard.sample.R2.dna.txt &&

java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.rna.filtered.bam O=sample.R2.rna.clean.bam M=picard.sample.R2.rna.txt 

6.提取mapped的sam -> mapped.sam

samtools view -F 4 sample.R1.dna.clean.bam > sample.R1.dna.clean.sam &&

samtools view -F 4 sample.R1.rna.clean.bam > sample.R1.rna.clean.sam &&

samtools view -F 4 sample.R2.dna.clean.bam > sample.R2.dna.clean.sam &&

samtools view -F 4 sample.R2.rna.clean.bam > sample.R2.rna.clean.sam

7.转为simple txt -> txt

awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.dna.clean.sam > sample.R1.dna.txt &&

awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.rna.clean.sam > sample.R1.rna.txt &&

awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.dna.clean.sam > sample.R2.dna.txt &&

awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.rna.clean.sam > sample.R2.rna.txt

transcript:

1.RNA fq mapping -> bam

2.过滤掉MAPQ<20 -> filtered.bam

3.去重 -> clean.bam

4.提取mapped的sam -> sam

5.转为simple txt -> txt

上一篇 下一篇

猜你喜欢

热点阅读