MACS2 parameters

2021-12-04  本文已影响0人  余绕

MACS2 parameters

There are seven major functions available in MACS2 serving as sub-commands. We will only cover callpeak in this lesson, but you can use macs2 COMMAND -h to find out more, if you are interested.

callpeak is the main function in MACS2 and can be invoked by typing macs2 callpeak. If you type this command without parameters, you will see a full description of commandline options. Here is a shorter list of the commonly used ones:

Input file options

Output arguments

Shifting model arguments

Peak calling arguments

NOTE: Relaxing the q-value does not behave as expected in this case since it is partially tied to peak widths. Ideally, if you relaxed the thresholds, you would simply get more peaks but with MACS2 relaxing thresholds also results in wider peaks.

Now that we have a feel for the different ways we can tweak our command, let’s set up the command for our run on Nanog-rep1:

$ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
    -c bowtie2/H1hesc_Input_Rep1_aln.bam \
    -f BAM -g 1.3e+8 \
    -n Nanog-rep1 \
    --outdir macs2

The tool is quite verbose so you should see lines of text being printed to the terminal, describing each step that is being carried out. If that runs successfully, go ahead and re-run the same command but this time let’s capture that information into a log file using 2> to re-direct the stadard error to file:

$ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
    -c bowtie2/H1hesc_Input_Rep1_aln.bam \
    -f BAM -g 1.3e+8 \
    -n Nanog-rep1 \
    --outdir macs2 2> macs2/Nanog-rep1-macs2.log

Ok, now let’s do the same peak calling for the rest of our samples:

macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Nanog-rep2 2> macs2/Nanog-rep2-macs2.log
 
macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep1_aln.bam -c bowtie2/H1hesc_Input_Rep1_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep1 2> macs2/Pou5f1-rep1-macs2.log
     
macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep2 2> macs2/Pou5f1-rep2-macs2.log

MACS2 Output files

Now, there should be 6 files output to the results directory for each of the 4 samples, so a total of 24 files:

* _peaks.narrowPeak: BED6+4 format file which contains the peak locations together with peak summit, pvalue and qvalue
* _peaks.xls: a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment
* _summits.bed: peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended

Let’s first obtain a summary of how many peaks were called in each sample. We can do this by counting the lines in the .narrowPeak files:

$ wc -l *.narrowPeak

We can also generate plots using the R script file that was output by MACS2. There is a _model.R script in the directory. Let’s load the R module and run the R script in the command line using the Rscript command as demonstrated below:

$ module load gcc/6.2.0 R/3.4.1

$ Rscript Nanog-rep1_model.r

NOTE: We need to load the gcc/6.2.0 before loading R. You can find out which modules need to be loaded first by using module spider R/3.4.1`

Now you should see a pdf file in your current directory by the same name. Create the plots for each of the samples and move them over to your laptop using Filezilla.

Open up the pdf file for Nanog-rep1. The first plot illustrates the distance between the modes from which the shift size was determined.

image

The second plot is the cross-correlation plot. This is a graphical representation of the Pearson correlation of positive- and negative- strand tag densities, shifting the strands relative to each other by increasing distance. We will talk about this in more detail in the next lesson.

NOTE: SPP is another very commonly used tool for narrow peak calling. While we will not be going through the steps for this peak caller in this workshop, we do have a lesson on SPP that we encourage you to browse through if you are interested in learning more.

Data source: Peak calling with MACS2 | Introduction to ChIP-Seq using high-performance computing (hbctraining.github.io)

上一篇下一篇

猜你喜欢

热点阅读