基因组高通量数据模拟软件

2018-04-13  本文已影响222人  生信family

本文主要基于2016年发表在『Nature Review Genetics』杂志上综述文章『A comparison of tools for the simulation of genomic next-generation sequencing data』。

在学习高通量测序的相关知识的时候,我们往往陷入两个困境:(1)找不到想要的数据;(2)数据太大,难以下载分析。这时,高通量数据模拟的软件就派上用场了。

简单来说,测序数据模拟软件主要用于一下三个方面:

  1. planning experiments
  2. testing hypotheses
  3. benchmarking tools
  4. evaluating particular results
    The simulation of NGS data can be extremely useful for planning experiments,
testing hypotheses, benchmarking tools and evaluating particular results. 
    Given a reference genome or dataset, for instance,  one can play with
an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal, 
possibly optimizing time and costs. 
    Yet, this is still not the standard practice and researchers often base their choices on 
practical considerations like technology and money availability.
     As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful 
when evaluating assembly, mapping, phasing or genotyping algorithms exposing their advantages and drawbacks under different circumstances.

这篇综述文章评估了23个测序数据模拟软件,介绍各自不同的特点,需求及潜在应用,并给出选取合适软件的方法。


软件列表

NGS genomic simulators decision tree.

下面的树状图简单说明了选取不同方法的原则


emss-70941-f001.jpg

Main characteristics of current NGS technologies

目前不同NGS技术的一些特点。注意,『X』表示存在。

image.png

General overview of the sequencing process and steps that can be parameterized in the simulations

image.png

General overview of NGS simulation

image.png

General information about 23 NGS genomic simulators

23种模拟软件的特点


image.png

Technical information about 23 NGS genomic simulators

image.png

Genomic variants

image.png

最后,直接给出该文章的online summary:


image.png

欢迎大家关注我的微信公众号『生信family』


qrcode_for_gh_a055c85e7513_258-2.jpg
上一篇下一篇

猜你喜欢

热点阅读