【pySCENIC】构建其它物种的motif数据库

2023-10-04 本文已影响0人 jjjscuedu

前面测试了常见的人类和小鼠的数据集，但是我自己是做植物的。如果想要运行pySCENIC，就需要去建立自己的数据库。关于这块网上的资料，我自己也是遇到了一些问题，现在把自己的流程与大家分享。我们先拿植物里面的模式生物拟南芥为例子来试下，然后再试下其它物种的。

官网的教程：

https://github.com/aertslab/create_cisTarget_databases

但是官网的叙述很简单，或者基本没什么叙述。我们先来查看一下帮助文件：

python create_cistarget_motif_databases.py -h

usage: create_cistarget_motif_databases.py [-h] -f FASTA_FILENAME [-F ORIGINAL_SPECIES_FASTA_FILENAME]

-M MOTIFS_DIR -m MOTIFS_LIST_FILENAME

[-5 MOTIF_MD5_TO_MOTIF_ID_FILENAME] -o DB_PREFIX

[-c CLUSTER_BUSTER_PATH] [-t NBR_THREADS]

[-p CURRENT_PART NBR_TOTAL_PARTS]

[-g EXTRACT_GENE_ID_FROM_REGION_ID_REGEX_REPLACE]

[-b BG_PADDING] [--min MIN_NBR_MOTIFS] [--max MAX_NBR_MOTIFS]

[-l] [-s SEED] [-r SSH_COMMAND]

create_cistarget_motif_databases.py帮助文件

从帮助文件来看，有几个必须的文件：

-f FASTA_FILENAME, --fasta FASTA_FILENAME FASTA filename which contains the regions/genes to score with Cluster-Busterforeach motif. When creating a cisTarget species database from regions/genes lifted over from a different species, provide the original FASTA fileforthat species to -F. #这应该是我们关心的区域或者基因的序列文件

-M MOTIFS_DIR, --motifs_dir MOTIFS_DIR Path to directory with Cluster-Buster motifs.

-m MOTIFS_LIST_FILENAME, --motifs MOTIFS_LIST_FILENAME Filename with list of motif IDs or motif MD5 names to be scored from directory specified by"--motifs_dir".

#这2个文件应该是对应的。-m指的应该是motif的ID文件。-M指的是文件夹的路径，这个文件夹中包含每个Cluster-Buster 相关的motif。

因为我以前也不专门研究motif相关的，对于这块所说的Cluster-Buster格式或者相关也不太了解。所以又专门查看了一下Cluster-Buster。

========Cluster-Buster========

官网文件如下：

https://github.com/weng-lab/cluster-buster

官网给出的用法。从官网的用法来说，应该是根据motif的matrix文件去鉴定给定的序列包含不包含相应的motif以及详细信息。