python networkx库分析newmovies数据集
2021-10-22 本文已影响0人
Cache_wood
networkx是一个处理图结构的python第三方库,提供简洁的API,方便用户画图。
newmovies.txt保存了相关数据,其中*Vertices 34282 下的每一行为一个节点,表示一位明星、编剧或电影。每一行中属性以\t分割,分别为节点id,名称,节点权重,节点类型,其他信息(其他信息以";"分割)。注意,节点里的权重信息是原数据集提供的,本次作业用不到,另外edges部分的参数每行三个数,前两个是边所连接的节点id,第三个值均为1。
@[toc]
newmovies.txt数据集
*Vertices 34282
0 "Ann Blyth" 6035 starring 1928 births;Living people;American film actors;American musical theatre actors;American child actors;People from Westchester County, New York;
1 "Karen Allen" 7467 starring American film actors;American stage actors;American video game actors;Bard College at Simon's Rock faculty;Illinois actors;People from Greene County, Illinois;Saturn Award winners;
2 "Mel Tormé" 18868 writer 1925 births;1999 deaths;American actor-singers;American jazz singers;American Jews;American male singers;American singers;American television actors;Blue-eyed soul singers;Burials at Westwood Village Memorial Park Cemetery;Chicago musicians;Deaths from stroke;Grammy Award winners;Grammy Lifetime Achievement Award winners;Jewish actors;Jewish American musicians;Jewish singers;Jewish composers and songwriters;Traditional pop music singers;Russian-American Jews;
3 "Jane Anderson" 3355 director American film actors;American film directors;LGBT directors;California actors;
4 "Lou Myers (actor)" 3288 starring 1945 births;African American actors;American film actors;American television actors;Living people;People from Kanawha County, West Virginia;
5 "Mary Ainslee" 2274 starring American film actors;
*Edges
0 4221 1
0 4390 1
0 2664 1
0 6885 1
0 989 1
0 7387 1
networkx
from community import community_louvain
import networkx as nx
import matplotlib.pyplot as plt
#导入数据
nodeID=list()
nodeName =list()
nodeWeight =list()
nodeType =list()
edge1 =list()
edge2 =list()
with open('newmovies.txt',encoding='utf-8') as n:
lines1 = n.readlines()[1:1000]
lines2 = n.readlines()[34285:36285]
for line in lines1:
nodeID.append(line.strip().split('\t')[0])
nodeName.append(line.strip().split('\t')[1])
nodeWeight.append(line.strip().split('\t')[2])
nodeType.append(line.strip().split('\t')[3])
for line in lines2:
edge1.append(line.strip().split('\t')[0])
edge2.append(line.strip().split('\t')[1])
#matplotlib绘图
G = nx.Graph()
G.add_nodes_from(nodeID,name=nodeName,weight=nodeWeight,ntype=nodeType)
for x in range(0,len(edge1)):
G.add_edge(edge1[x], edge2[x])
G.number_of_nodes()
partition = community_louvain.best_partition(G)
size =float(len(set(partition.values())))
pos = nx.spring_layout(G)
count =0
for com in set(partition.values()):
count = count +1
list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
nx.draw_networkx_nodes(G, pos, list_nodes,node_size=20,node_color=str(count/size))
nx.draw_networkx_edges(G,pos,alpha=0.5)
plt.savefig('community.png')
plt.show()
plot
networkx的绘图能力相对gephi等软件来说还是比较粗糙,之后会使用gephi软件进行绘图。
gephi绘图
在gephi中导入数据之后,选择合适的布局,此处选择ForthAtlas 2布局,之后进行模块化统计,将不同的点划分到不同的模块,在节点的partition选项中选择Modularity Class,最终得到模块化之后的图像。
可以看到不同的点被划分到较为集中的几个分区之中。