python

ipyrad——主成分分析(一)

2021-01-17  本文已影响0人  Morriyaty

这次准备介绍的是一个使用简便的Python分析包:ipyrad。(在此特别感谢虞米儿~)
ipyrad 官方文档: https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-abba-baba.html
对我来说比较有用的是下面这几个模块:

vcf_to_hdf5
treemix
PCA
RAxML
STRUCTURE
abba-baba

先来介绍一下PCA模块~

下载

作者推荐使用conda下载,最好下载比较新的版本(python v 3.7)

conda install ipyrad=0.9.63 -c bioconda

到这主体部分就下好了。

PCA分析

下载额外用到的模块

conda install scikit-learn -c bioconda
conda install toyplot -c eaton-lab

将VCF文件转换为haf5格式

vcf2hdf5.py:

### name : output_file_name  data : input_vcf  ld_block_size : window_size
import ipyrad.analysis as ipa
import pandas as pd

converter = ipa.vcf_to_hdf5(
        name="test_LD10k",
        data="all_snp_1.vcf.gz",
        ld_block_size=10000,
)

converter.run()

运行结果在 analysis-vcf2hdf5 文件夹下

运行PCA分析

ipyrad_pca.py :

import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf

data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"

imap = {
        "AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
        "ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
        }

minmap = {i: 0.5 for i in imap}

pca = ipa.pca(data = data,
        imap = imap,
        minmap = minmap,
        mincov = 0.75,
        impute_method = "sample",
        )

pca.run(nreplicates=25,seed=12345)

df = pd.DataFrame(pca.pcaxes[0], index=pca.names)
df.to_csv("pca_analysis.csv")

canvas, axes = pca.draw(0,2)
toyplot.pdf.render(canvas, "PCA-out.pdf")

这里产生两个结果
pca_analysis.csv PCA分析结果文件:

1.png
PCA-out.pdf 结果图:
1.png

这里运用的是pca.run(nreplicates=25,seed=12345) 方法,可以根据说明书自行修改。

tSNE分析

tSNE是降维分析的一种,具体原理还不是很懂,这个方法也提供了tSNE方法。
ipyrad_tsne.py :

import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf

data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"

imap = {
        "AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
        "ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
        }

minmap = {i: 0.5 for i in imap}

pca = ipa.pca(data = data,
        imap = imap,
        minmap = minmap,
        mincov = 0.75,
        impute_method = "sample",
        )

pca.run_tsne(subsample=True, perplexity=4.0, n_iter=100000, seed=123)

canvas, axes = pca.draw()
toyplot.pdf.render(canvas, "tsne-out.pdf")

tsne-out.pdf :

2.png

PCA这里就写这么多,只是编写了我的做法,更详细的内容还是去看官方文档吧~

ipyrad后续内容,且听下回。

上一篇下一篇

猜你喜欢

热点阅读