Homer使用参数

2023-07-11 本文已影响0人佳名

wget -c http://homer.ucsd.edu/homer/configureHomer.pl

homer --help
homer : Empirical Motif Optimizer
usage: ./homer [data] [parameters] -a [action]
This program is meant to be called from other programs (i.e. findMotifsGenome.pl), and not used directly
Data options:
    -dna|-prot : Sequence type (-dna)
    -s <file> : Sequence File
    -g <file> : Group/Stat File
    -mer <file> : Mer File
    -m <file> : PSSM Motif File
    -o <file> : output file prefix
    -seed <file> : seed file
    -offset <#> : offset of sequence from TSS (-2000)
Parameter options:
    -exact : remember mapping between mers and genes (default: approx)
    -w : Weight sequences (according to addition columns in group file: 1st-gene 2nd-sequence)
    -T : Test all sequences as candidate motifs (default: only test target sequences)
    -noautoscale : Do not autoscale sequences to be equal in foreground and background
    -freqAdjust : Compute log-odds using frequency, default (0.25)
    -dual : find dual motifs in the form A<gap>B where A and B can be rev-opposites
    -flip : find dual motifs in the form A<gap>B or B<gap>A
    -zoopsapprox <OFF,#(max to count)> : (counts multiple motifs per sequence | default: 2)
    -norevopp : don't search opposite strand (default->DNA:yes, Protein:no)
    -min <#> : min mer size (6)
    -max <#> : max mer size [also standard mer size] (10)
    -len <#> : Find motifs of length # (default=10)
    -gap <#,#,#-#> : Find motifs with gaps(0)(i.e. -gap 3 -gap 2,4,5 -gap 1-10
        Gaps will only be in the center of motif and will only use even lengthed motifs
    -mis <#> : # of mismatches to check for degeneracy (1)
    -IUPAC <#> : # of IUPAC codes per mer that can be used during global optimization (0)
    -iupactype <1,2,or3> : Type of IUPAC symbols used
        1: (default) Only N is used
        2: Only N and 2 bp symbols are used (i.e. R = A or G
        3: Full IUPAC code is used (includes 3-way symbols)
    -S <#> : number of seeds to check during profile optimization(50)
    -branch <#> : sets depth of optimization (closer to zero the more sensitive (0.5))
    -I <#> : maximum number of iterations during optimization (5)
    -rmalign : DO NOT remove aligned seeds
    -maxneg <0 to 1> maximum percentage of negative genes that can contain the motif
    -speed <NORMAL|FAST>: Program will heuristically avoid performing exhaustive
        calculations (default: FAST)
Scoring Functions:
    -alg <method> : scoring algorithm (default: hypergeo)
        hypergeo - hypergeometric scoring (ZOOPS)
        binomial - binomical scoring [for variable length seq] (ZOOPS) (requires exact)
        approxbinomial - binomical scoring [for variable length seq] (ZOOPS) (requires exact)
        sitehypergeo - hypergeometric scoring across seq positions (very slow)
        sitebinomial - binomial scoring across seq positions
        fisher <#> - fisher exact test (slow, # scales exponentially)
          <# = largest repetition to consider [default=2]>
        rank - group file must have sortable numeric value
        freqdiff - used by most bayesian/nnet programs
        logit - used by most bayesian/nnet programs
Background Modeling options (this forces a binomial style scoring function):
    -b <method> [method options...]
        markov <#> - generate hmm from target sequences using a hmm of order #
        bmarkov <#> - generate hmm from background sequences using a hmm of order #
        mosaic - generate mosaic hmm from background sequences **coming soon**
Filter Options:
    -N <float> : filtering cutoff for ratio of N's in sequence (0.9)
    -seqless <#> : filter sequences shorter than #
    -seqmore <#> : filter sequences longer than #
Actions (-a):
    MOTIFS - Find motifs <outfile>.motifs# where # = motif length
    MERS - Create mer file (low memory) <stdout>
    DMERS - Create degenerate mer file <stdout>
    FIND - find motifs in sequence <stdout>
    OPTPVALUE - optimize motif threshold and pvalue (exact)<stdout>
    GETPVALUE - get the p-value enrichment for a given motif(exact)<stdout>
    GENESCORE - returns highest motif score for each gene <stdout>
    REFINE - optimize motif PSSM profile, threshold, and pvalue <stdout>
    REFINETHRESH - optimize motif PSSM threshold and pvalue <stdout>
    CLUSTER - cluster mers from seed file (can't use exact scoring) <outfile>
    SORTMERS - sort a mer file according to pvalue <stdout>
    REMOVE - removes motif from sequence (replaces with N's) <stdout>

This program is meant to be called from other programs (i.e. findMotifsGenome.pl), and not used directly

findMotifsGenome.pl -h

Program will find de novo and known motifs in regions in the genome     
                                                                                
        Usage: findMotifsGenome.pl <pos file> <genome> <output directory> [additional options]
    Example: findMotifsGenome.pl peaks.txt mm8r peakAnalysis -size 200 -len 8

    Possible Genomes:
            -- or --
        Custom: provide the path to genome FASTA files (directory or single file)
            Heads up: will create the directory "preparsed/" in same location.

    Basic options:
        -mask (mask repeats/lower case sequence, can also add 'r' to genome, i.e. mm9r)
        -bg <background position file> (genomic positions to be used as background, default=automatic)
            removes background positions overlapping with target positions unless -keepOverlappingBg is used
            -chopify (chop up large background regions to the avg size of target regions)
        -len <#>[,<#>,<#>...] (motif length, default=8,10,12) [NOTE: values greater 12 may cause the program
            to run out of memory - in these cases decrease the number of sequences analyzed (-N),
            or try analyzing shorter sequence regions (i.e. -size 100)]
        -size <#> (fragment size to use for motif finding, default=200)
            -size <#,#> (i.e. -size -100,50 will get sequences from -100 to +50 relative from center)
            -size given (uses the exact regions you give it)
        -S <#> (Number of motifs to optimize, default: 25)
        -mis <#> (global optimization: searches for strings with # mismatches, default: 2)
        -norevopp (don't search reverse strand for motifs)
        -nomotif (don't search for de novo motif enrichment)
        -rna (output RNA motif logos and compare to RNA motif database, automatically sets -norevopp)

    Scanning sequence for motifs
        -find <motif file> (This will cause the program to only scan for motifs)

    Known Motif Options/Visualization
        -mset <vertebrates|insects|worms|plants|yeast|all> (check against motif collects, default: auto)
        -basic (just visualize de novo motifs, don't check similarity with known motifs)
        -bits (scale sequence logos by information content, default: doesn't scale)
        -nocheck (don't search for de novo vs. known motif similarity)
        -mcheck <motif file> (known motifs to check against de novo motifs,
        -float (allow adjustment of the degeneracy threshold for known motifs to improve p-value[dangerous])
        -noknown (don't search for known motif enrichment, default: -known)
        -mknown <motif file> (known motifs to check for enrichment,
        -nofacts (omit humor)
        -seqlogo (use weblogo/seqlogo/ghostscript to generate logos, default uses SVG now)

    Sequence normalization options:
        -gc (use GC% for sequence content normalization, now the default)
        -cpg (use CpG% instead of GC% for sequence content normalization)
        -noweight (no CG correction)
        Also -nlen <#>, -olen <#>, see homer2 section below.

    Advanced options:
        -h (use hypergeometric for p-values, binomial is default)
        -N <#> (Number of sequences to use for motif finding, default=max(50k, 2x input)
        -local <#> (use local background, # of equal size regions around peaks to use i.e. 2)
        -redundant <#> (Remove redundant sequences matching greater than # percent, i.e. -redundant 0.5)
        -maxN <#> (maximum percentage of N's in sequence to consider for motif finding, default: 0.7)
        -maskMotif <motif file1> [motif file 2]... (motifs to mask before motif finding)
        -opt <motif file1> [motif file 2]... (motifs to optimize or change length of)
        -rand (randomize target and background sequences labels)
        -ref <peak file> (use file for target and background - first argument is list of peak ids for targets)
        -oligo (perform analysis of individual oligo enrichment)
        -dumpFasta (Dump fasta files for target and background sequences for use with other programs)
        -preparse (force new background files to be created)
        -preparsedDir <directory> (location to search for preparsed file and/or place new files)
        -keepFiles (keep temporary files)
        -fdr <#> (Calculate empirical FDR for de novo discovery #=number of randomizations)

    homer2 specific options:
        -homer2 (use homer2 instead of original homer, default)
        -nlen <#> (length of lower-order oligos to normalize in background, default: -nlen 3)
            -nmax <#> (Max normalization iterations, default: 160)
            -neutral (weight sequences to neutral frequencies, i.e. 25%, 6.25%, etc.)
        -olen <#> (lower-order oligo normalization for oligo table, use if -nlen isn't working well)
        -p <#> (Number of processors to use, default: 1)
        -e <#> (Maximum expected motif instance per bp in random sequence, default: 0.01)
        -cache <#> (size in MB for statistics cache, default: 500)
        -quickMask (skip full masking after finding motifs, similar to original homer)
        -minlp <#> (stop looking for motifs when seed logp score gets above #, default: -10)

    Original homer specific options:
        -homer1 (to force the use of the original homer)
        -depth [low|med|high|allnight] (time spent on local optimization default: med)

参考资料：
http://homer.ucsd.edu/homer/introduction/install.html
[软件使用 2] HOMER安装和使用攻略，如何获取Motif? - 知乎 (zhihu.com)
http://www.360doc.com/content/21/0714/12/76149697_986500345.shtml

Homer使用参数

猜你喜欢

热点阅读