散文想法简友广场

python networkx库分析newmovies数据集

2021-10-22  本文已影响0人  Cache_wood

networkx是一个处理图结构的python第三方库,提供简洁的API,方便用户画图。

newmovies.txt保存了相关数据,其中*Vertices 34282 下的每一行为一个节点,表示一位明星、编剧或电影。每一行中属性以\t分割,分别为节点id,名称,节点权重,节点类型,其他信息(其他信息以";"分割)。注意,节点里的权重信息是原数据集提供的,本次作业用不到,另外edges部分的参数每行三个数,前两个是边所连接的节点id,第三个值均为1。
@[toc]

newmovies.txt数据集

*Vertices 34282
0   "Ann Blyth" 6035    starring    1928 births;Living people;American film actors;American musical theatre actors;American child actors;People from Westchester County, New York;
1   "Karen Allen"   7467    starring    American film actors;American stage actors;American video game actors;Bard College at Simon's Rock faculty;Illinois actors;People from Greene County, Illinois;Saturn Award winners;
2   "Mel Tormé" 18868   writer  1925 births;1999 deaths;American actor-singers;American jazz singers;American Jews;American male singers;American singers;American television actors;Blue-eyed soul singers;Burials at Westwood Village Memorial Park Cemetery;Chicago musicians;Deaths from stroke;Grammy Award winners;Grammy Lifetime Achievement Award winners;Jewish actors;Jewish American musicians;Jewish singers;Jewish composers and songwriters;Traditional pop music singers;Russian-American Jews;
3   "Jane Anderson" 3355    director    American film actors;American film directors;LGBT directors;California actors;
4   "Lou Myers (actor)" 3288    starring    1945 births;African American actors;American film actors;American television actors;Living people;People from Kanawha County, West Virginia;
5   "Mary Ainslee"  2274    starring    American film actors;
*Edges
0   4221    1
0   4390    1
0   2664    1
0   6885    1
0   989 1
0   7387    1

networkx

from community import community_louvain
import networkx as nx
import matplotlib.pyplot as plt

#导入数据
nodeID=list()
nodeName =list()
nodeWeight =list()
nodeType =list()
edge1 =list()
edge2 =list()

with open('newmovies.txt',encoding='utf-8') as n:
    lines1 = n.readlines()[1:1000]
    lines2 = n.readlines()[34285:36285]

for line in lines1:
    nodeID.append(line.strip().split('\t')[0])
    nodeName.append(line.strip().split('\t')[1])
    nodeWeight.append(line.strip().split('\t')[2])
    nodeType.append(line.strip().split('\t')[3])

for line in lines2:
    edge1.append(line.strip().split('\t')[0])
    edge2.append(line.strip().split('\t')[1])

#matplotlib绘图
G = nx.Graph()

G.add_nodes_from(nodeID,name=nodeName,weight=nodeWeight,ntype=nodeType)

for x in range(0,len(edge1)):
    G.add_edge(edge1[x], edge2[x])
    G.number_of_nodes()

partition = community_louvain.best_partition(G)

size =float(len(set(partition.values())))

pos = nx.spring_layout(G)

count =0

for com in set(partition.values()):
    count = count +1
    list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
    nx.draw_networkx_nodes(G, pos, list_nodes,node_size=20,node_color=str(count/size))
    nx.draw_networkx_edges(G,pos,alpha=0.5)

plt.savefig('community.png')
plt.show()

plot


networkx的绘图能力相对gephi等软件来说还是比较粗糙,之后会使用gephi软件进行绘图。

gephi绘图

在gephi中导入数据之后,选择合适的布局,此处选择ForthAtlas 2布局,之后进行模块化统计,将不同的点划分到不同的模块,在节点的partition选项中选择Modularity Class,最终得到模块化之后的图像。



可以看到不同的点被划分到较为集中的几个分区之中。

上一篇 下一篇

猜你喜欢

热点阅读