生信生物信息学暑期培训

「生信」HiC-Pro最新安装及使用指南

2019-05-18  本文已影响70人  bioinfo_boy

目录

  • 写在前面
  • 01-What is Hic-Pro?
  • 02-How does it work?
  • 03-How to install it?
  • 04-How to use it?
  • 05-Test
  • 06-Output file

  • 07-User cases
  • 08-Hic-Pro utilities
  • 09-Compatibility with other software
  • 相关链接
  • 最后

写在前面

01-What is Hic-Pro?

02-How does it work?

流程

reads mapping

片段评估过滤, 获得valid reads

根据实验步骤序列连接的各种可能性, 去除invalid reads

质量控制

软件设计了两个质控环节

Map builder

ICE Normalization

03-How to install it?

软件介绍中提到了Singularty, 我查了查, 它是个微型操作系统, 实现起来类似于anaconda, 却比conda复杂的多, 所以直接pass

方法一: Anaconda 安装

方法一比方法二安装速度快很多, 而且比较傻瓜, 不用多说, 现在安装生信的软件, 第一个想到的当然是conda, 但它也不是万能的...

conda安装常规流程

$conda install -c davebx hicpro
$cd PATH_TO_YOUR_MINICONDA/HiC-Pro_2.10.0
$vi config-install.txt
#添加依赖路径
$make

方法二: GitHub中提供的安装脚本

这是在conda不能用时, 我第一个想到的替代方案

安装流程

git clone https://github.com/nservant/HiC-Pro.git
conda install -y samtools bowtie2 R
conda install -y pysam bx-python numpy scipy 
R
install.packages(c('ggplot2','RColorBrewer'))
make configure
make

04-How to use it?

注释文件

命令行

stepwise mode input type
-s mapping .fastq(.gz) files
-s proc_hic .bam files
-s quality_checks .bam files
-s merge_persample .validPairs files
-s build_contact_maps .validPairs files
-s ice_norm .matrix files
MY_INSTALL_PATH/bin/HiC-Pro -i FULL_PATH_TO_RAW_DATA -o FULL_PATH_TO_OUTPUTS -c MY_LOCAL_CONFIG_FILE -s mapping -s quality_checks
MY_INSTALL_PATH/bin/HiC-Pro -i FULL_PATH_TO_RAW_DATA -o FULL_PATH_TO_OUTPUTS -c MY_LOCAL_CONFIG_FILE

05-Test

用官网提供的数据集, 比对到人类hg19参考基因组中, 关于人的参考基因组水就很深了, 去哪下载, 怎么下载速度最快, 大家还是自行谷歌吧
关于编译文件怎么填写会在下面介绍

记录一

记录二

记录三

06-Output file

目录列表

统计结果


07-User cases

分步运行Hic-pro

HiC-Pro -i ${RES_PREFIX}_3/bowtie_results/bwt2 -o ${RES_PREFIX}_3.1 -c config_test.txt -s proc_hic -s quality_checks
HiC-Pro -i ${RES_PREFIX}_3.1/hic_results/data -o ${RES_PREFIX}_3.2 -c config_test.txt -s build_contact_maps -s ice_norm

等位基因互作构建

分析DNase Hi-C 数据

编译文件中不要填 LIGATION_SITE 和 GENOME_FRAGMENT

分析capture-C 数据

分析capture Hi-C数据

需要给出给定区段的bed文件及CAPTURE_TARGET

08-HIC-PRO UTILITIES

01-SPLIT_READS.PY

用于拆分reads文件, 多线程mapping?

HICPRO_PATH/bin/utils/split_reads.py --results_folder OUTPUT --nreads READS_NB INPUT_FASTQ

02-EXTRACT_SNPS.PY

从基因phasing数据中提取SNP位点

## Extract SNPs information for CASTEiJ/129S1 cross
##下载 ftp://ftp-mouse.sanger.ac.uk/current_snps/
HICPRO_PATH/bin/utils/extract_snps.py -i mgp.v2.snps.annot.reformat.vcf -r CASTEij -a 129S1 > snps_CASTEiJ_129S1.vcf

03-DIGEST_GENOME.PY

## Double digestion, HindIII + DpnII
HICPRO_PATH/bin/utils/digest_genome.py -r hindiii dpnii -o mm9_hindiii_dpnii.bed mm9.fasta

04-HICPRO2JUICEBOX.SH

生成Juicebox可视化软件的输入文件

## Convert HiC-Pro output to Juicebox input up to restriction fragment resolution
HICPRO_PATH/bin/utils/hicpro2juicebox.sh -i hicpro_res/hic_results/data/dixon_2M/dixon_2M_allValidPairs -g hg19 -j /usr/local/juicebox/juicebox_clt_1.4.jar -f  HICPRO_PATH/data_info/HindIII_resfrag_hg19.bed

05-SPARSETODENSE.PY

将sparse symmetric 矩阵格式转换为dense matrices格式, renbing给出的TAD directionaly index 方法需要提交该格式文件

## Convert to dense format
HICPRO_PATH/bin/utils/sparseToDense.py -b hic_results/matrix/dixon_2M/raw/1000000/dixon_2M_1000000_abs.bed hic_results/matrix/dixon_2M/iced/1000000/dixon_2M_1000000_iced.matrix
## Convert todense format per chromosome
HICPRO_PATH/bin/utils/sparseToDense.py -b hic_results/matrix/dixon_2M/raw/1000000/dixon_2M_1000000_abs.bed hic_results/matrix/dixon_2M/iced/1000000/dixon_2M_1000000_iced.matrix --perchr
## Convert into TADs caller input from Dixon et al.
HICPRO_PATH/bin/utils/sparseToDense.py -b hic_results/matrix/dixon_2M/raw/1000000/dixon_2M_1000000_abs.bed hic_results/matrix/dixon_2M/iced/1000000/dixon_2M_1000000_iced.matrix --perchr --di

06-HICPRO2FITHIC.PY

转成Fit-Hi-C的输入文件

## Whith IC bias vector
HICPRO_PATH/bin/utils/hicpro2fithic.py -i hic_results/matrix/dixon_2M/raw/1000000/dixon_2M_1000000.matrix -b hic_results/matrix/dixon_2M/raw/1000000/dixon_2M_1000000_abs.bed -s hic_results/matrix/dixon_2M/iced/1000000/dixon_2M_1000000_iced.matrix.biases

09-COMPATIBILITY WITH OTHER SOFTWARE

## Plot the chrX at 150Kb resolution
python HiCPlotter.py -f hic_results/matrix/sample1/iced/150000/sample1_150000_iced.matrix -o Exemple -r 150000 -tri 1 -bed hic_results/matrix/sample1/raw/150000/sample1_150000_ord.bed -n Test -chr chrX -ptr 1

相关链接

最后

上一篇 下一篇

猜你喜欢

热点阅读