#分子模拟#MDTraj分子模拟python包（二）

2017-06-08 本文已影响353人生信杂谈

接上期，这期我们来简单的讲一下选择原子的方法和查找聚类中心。由于上期有同学提了输出与代码块分开较好，所以这期的示例输出是放在代码块以外的。

2.选择原子

入门

这个例子，我们将在MDTraj上进行基础的原子和残基的选择，首先让我们加载一个例子轨迹.


from __future__ import print_function

import mdtraj as md

traj=md.load('ala2.h5')

print(traj)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, without unitcells>

我们可以使用traj.n_atoms和traj.n_residues来直接的测出多少原子或残基

print('多少原子? %s' % traj.n_atoms)
print('多少残基? %s' % traj.n_residues)

我们同样可以使用traj.xyz来操作原子的位置，traj.xyz是一个Numpy array包含每个原子维度上的xyz坐标(n_frames, n_atoms, 3)。让我们找出第五帧的第十个原子的3维坐标。


frame_idx = 4 #零开始索引，下同

atom)idx=9

print('第五个原子的第十帧的位置在哪里？') #个人觉得教程写反了

print('x:%s\ty:%s\tz:%s'%tuple(traj.xyz[frame_idx, atom_idx,:])))

拓扑对象

就如之前所介绍那样，每个轨迹对象都包含有拓扑.轨迹的拓扑包含所有的连接信息在你的体系和特别的链，残基和原子信息.


topology=traj.topology

print(topology)

<mdtraj.Topology with 1 chains, 3 residues, 22 atoms, 21 bonds>

在拓扑对象中我们可以选择一个清晰的原子或者loop环（注意:所有均从0开始索引）


print('Fifth atom: %s' % topology.atom(4))

print('All atoms: %s' % [atom for atom in topology.atoms])

Fifth atom: ACE1-C

All atoms: [ACE1-H1, ACE1-CH3, ACE1-H2, ACE1-H3, ACE1-C, ACE1-O, ALA2-N, ALA2-H, ALA2-CA, ALA2-HA, ALA2-CB, ALA2-HB1, ALA2-HB2, ALA2-HB3, ALA2-C, ALA2-O, NME3-N, NME3-H, NME3-C, NME3-H1, NME3-H2, NME3-H3]

残基同样如此

print('Second residue: %s' % traj.topology.residue(1))

print('All residues: %s' % [residue for residue in traj.topology.residues])

Second residue: ALA2

All residues: [ACE1, ALA2, NME3]

所有的原子和残基同样拥有对象，拥有自己的属性集。这里有一些简单的例子:

atom = topology.atom(10)

print('''Hi! I am the %sth atom, and my name is %s. 

I am a %s atom with %s bonds. 

I am part of an %s residue.''' % ( atom.index, atom.name, atom.element.name, atom.n_bonds, atom.residue.name))

Hi! I am the 10th atom, and my name is CB.

I am a carbon atom with 4 bonds.

I am part of an ALA residue.

同样还有一些复杂的参数，例如atoms.is_sidechain 或者residue.is_protein,允许有更多的选择

大杂烩

你可以看到这些参数是如何和python的筛选功能结合在一起的。比如我们想查看我们分子侧链的所有碳原子，我们可以尝试这样.


print([atom.index for atom in topology.atoms if atom.element.symbol is 'C' and atom.is_sidechain])

或者我们想索引第一条链的所有奇数残基


print([residue for residue in topology.chain(0).residues if residue.index % 2 == 0])

原子选择语法

MDTraj拥有和PyMol和VMD类似的原子选择语法，你可以使用topology.select来使用，让我们查找最后两个残基的所有原子。

在主文档中含有更多信息


print(topology.select('resid 1 to 2'))

[ 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21]

你同样可以进行更多复杂的操作.例如我们查看骨架中的氮原子


print(topology.select('name N and backbone'))

[ 6 16]

如果你想看生成结果的代码，可以使用select_expression来实现


selection = topology.select_expression('name CA and resid 1 to 2')

print(selection)

[atom.index for atom in topology.atoms if ((atom.name == 'CA') and (1 <= atom.residue.index <= 2))]

3.查找聚类的中心（Finding centroids of clusters）

查找中心

在这个例子中，我们将要需找一个“中心”作为一组构象。这个组可能潜在的来自簇.使用例如Ward hierarchical 方法成簇.

注意这有一些可能的方式来定义中心，例如:


from __future__ import print_function

%matplotlib inline

import mdtraj as md

import numpy as np

加载轨迹


traj = md.load('ala2.h5')

print(traj)

让我们计算构象之间的RMSDs


atom_indices = [a.index for a in traj.topology.atoms if a.element.symbol != 'H']

distances = np.empty((traj.n_frames, traj.n_frames))

for i in range(traj.n_frames):

    distances[i] = md.rmsd(traj, traj, i, atom_indices=atom_indices)

beta = 1
index = np.exp(-beta*distances / distances.std()).sum(axis=1).argmax()
print(index)

centroid = traj[index]
print(centroid)
>
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, without unitcells>

更多原创精彩内容敬请关注生信杂谈：