数据分析师之路(四):用Ruby批量处理Excel

2017-07-20  本文已影响255人  诗与星空

最近沉浸于Python的学习中,收获颇多,但是在进行Excel表的处理的时候,我还是喜欢Ruby。原因很简单,Ruby对Excel的处理更容易理解,或许等我熟练地掌握Pandas后,就能脱离对ruby的依赖了。


本文介绍一个批量处理Excel表的例子,把3000多家上市公司的全部资产负债表、利润表、现金流量表、主要经营指标表重新组合,把感兴趣的指标整合在一张Excel表中。

一、先调用相应的库:

# -*- coding: UTF-8 -*-
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'win32ole'

其中hpricot和open-uri是对URL进行解析处理的库,win32ole是处理Excel的库。
二、然后定义几个时间参数:

s_report_date = "2017/3/31"
s_date = "20170720"
time = Time.new  #取当前时间
s_today = "#{time.year}" + "#{time.month}" + "#{time.day}" #把当前时间拼接成2017616格式,用于命名excel文件名
#puts s_today

三、创建Excel表

#创建一个excel表并按照当前日期命名
file_path = "d:\\rb\\stock\\"
file_name = s_today + "股票池.xls"
file_name_path = file_path + file_name

excel = WIN32OLE.new("excel.application")
excel.visible = true     # in case you want to see what happens 
workbook = excel.workbooks.add
workbook.saveas(file_name_path)
workbook.close
excel.Quit

win32ole的使用比较简单,直接按照操作说明比着葫芦画瓢即可。

四、将相关数据写入Excel表,这是我目前感兴趣的报表项目:

#写入表头
excel = WIN32OLE::new('Excel.Application')
workbook = excel.Workbooks.Open(file_name_path)
worksheet = workbook.Worksheets(1) #定位到第一个sheet
worksheet.Select
#写入
worksheet.Range("a2").Value = s_today + "By 路过银河" 
worksheet.Range("a3").Value = "股票代码" 
worksheet.Range("b3").Value = "股票名称" 
worksheet.Range("c3").Value = "参考评级" 
worksheet.Range("d3").Value = "行业板块"
worksheet.Range("e3").Value = "地区"
worksheet.Range("f3").Value = "概念"
worksheet.Range("g3").Value = "报表日期"
worksheet.Range("h3").Value = "总市值"
worksheet.Range("i3").Value = "市盈率"
worksheet.Range("j3").Value = "市净率"
worksheet.Range("k3").Value = "季报利润总额"
worksheet.Range("l3").Value = "年报利润总额"
worksheet.Range("m3").Value = "上年同期利润总额"
worksheet.Range("n3").Value = "利润同比增减"
worksheet.Range("o3").Value = "利润较年报1/4增减"
worksheet.Range("p3").Value = "季报净利润"
worksheet.Range("q3").Value = "季报投资收益"
worksheet.Range("r3").Value = "季报其他业务收支净额"
worksheet.Range("s3").Value = "非主营收入占利润比"
worksheet.Range("t3").Value = "2014净利润"
worksheet.Range("u3").Value = "2015净利润"
worksheet.Range("v3").Value = "2016净利润"
worksheet.Range("w3").Value = "2015净利润增长率"
worksheet.Range("x3").Value = "2016净利润增长率"
worksheet.Range("y3").Value = "三年净利为正且16增长率大于15"
worksheet.Range("z3").Value = "流动资产"
worksheet.Range("aa3").Value = "存货"
worksheet.Range("ab3").Value = "货币资金"
worksheet.Range("ac3").Value = "流动负债"
worksheet.Range("ad3").Value = "资产总额"
worksheet.Range("ae3").Value = "负债总额"
worksheet.Range("af3").Value = "流动比率"
worksheet.Range("ag3").Value = "速动比率"
worksheet.Range("ah3").Value = "现金率"
worksheet.Range("ai3").Value = "资产负债率"
worksheet.Range("aj3").Value = "销售费用"
worksheet.Range("ak3").Value = "管理费用"
worksheet.Range("al3").Value = "财务费用"
worksheet.Range("am3").Value = "主营业务净利率"
worksheet.Range("an3").Value = "主营业务毛利率"
worksheet.Range("ao3").Value = "净资产收益率"
worksheet.Range("ap3").Value = "经营现金流净额"
worksheet.Range("aq3").Value = "存货/利润"
worksheet.Range("ar3").Value = "盈利能力"
worksheet.Range("as3").Value = "偿债能力"
worksheet.Range("at3").Value = "营运能力"
worksheet.Range("au3").Value = "业绩预估"
worksheet.Range("av3").Value = "业绩成长"
worksheet.Range("aw3").Value = "估值"
worksheet.Range("ax3").Value = "交易性金融资产"
worksheet.Range("ay3").Value = "主营业务收入"
worksheet.Range("az3").Value = "投资收益"
worksheet.Range("ba3").Value = "主营业务成本"
worksheet.Range("bb3").Value = "季报主营业务收入"
worksheet.Range("bc3").Value = "2015主营业务收入"
worksheet.Range("bd3").Value = "2014主营业务收入"
worksheet.Range("be3").Value = "季报总资产"
worksheet.Range("bf3").Value = "2016总资产"
worksheet.Range("bg3").Value = "2015总资产"
worksheet.Range("bh3").Value = "2014总资产"
worksheet.Range("bi3").Value = "季报在建工程"
worksheet.Range("bj3").Value = "2016在建工程"
worksheet.Range("bk3").Value = "2015在建工程"
worksheet.Range("bl3").Value = "2014在建工程"

五、然后写一个循环,我对循环学得不好,只好用比较笨的办法写,这里其实应该能写成一两行,记得在前面给x赋值哟,并通过一个if语句判断是深市还是呼市的股票:

#循环开始
while x <= $nword
#puts "轮询中:"
stock_lines = IO.readlines("d:/rb/stock/stock.txt");  
s_code = stock_lines[x]
s_code_s = s_code.chomp  # chomp用来删除文本里带过来的换行符
if s_code_s >  "600000"
  scode = "sh" + s_code_s
elsif  
  s_code_s ==  "600000"
  scode = "sh" + s_code_s
else
  scode = "sz" + s_code_s
  
end

六、通过腾讯证券接口读实时股票信息,在此我主要是用来计算当前市值、市盈率和市净率:

doc1 = Hpricot(open('http://qt.gtimg.cn/q=' + "#{scode}"))  #调用腾讯股票实时接口
#puts doc

out_file1 = open('d:/rb/stock/temp1.txt', 'w') 
out_file1.write(doc1) 
out_file1.close

str1 = IO.read("d:/rb/stock/temp1.txt");  
  #puts str.length;  
  #puts str[0,30] 
str1.split(/~/)
arr1 = str1.split(/~/)
#定义股票接口对应的字段名
s_name = arr1[1] #股票名
s_number = arr1[2]
s_current_price = arr1[3] #当前价格
s_closing_price = arr1[4] #昨收 
s_opening_price = arr1[5] #今开
s_ltsz = arr1[44]

七、读取全市场全部报表数据,这个过程比较漫长,通常几个小时:

#读取主要财务指标表
##定义财务指标表存储路径
file_zycwzb_path = "d:/stock/zycwzb/"
file_zycwzb_name = s_code_s + "zycwzb.csv"
filename_zycwzb = file_zycwzb_path + file_zycwzb_name
#读取财务指标表
zycwzb_lines = IO.readlines(filename_zycwzb);  

#读取资产负债表
##定义资产负债表存储路径
file_zcfzb_path = "d:/stock/zcfzb/"
file_zcfzb_name = s_code_s + "zcfzb.csv"
filename_zcfzb = file_zcfzb_path + file_zcfzb_name
#读取资产负债表
zcfzb_lines = IO.readlines(filename_zcfzb);  

#读取利润表
##定义利润表存储路径
file_lrb_path = "d:/stock/lrb/"
file_lrb_name = s_code_s + "lrb.csv"
filename_lrb = file_lrb_path + file_lrb_name
#读取利润债表
lrb_lines = IO.readlines(filename_lrb);  

#读取现金流量表
##定义现金流量表存储路径
file_xjllb_path = "d:/stock/xjllb/"
file_xjllb_name = s_code_s + "xjllb.csv"
filename_xjllb = file_xjllb_path + file_xjllb_name
#读取利润债表
xjllb_lines = IO.readlines(filename_xjllb);  

#判断是否存在最新季报,如果存在,则读取相关列并定义变量
if zycwzb_lines[0] = s_report_date 
  puts "股票代码:" + s_code
  puts "报表日期:" + zycwzb_lines[0]
  #puts stock_lines[1]
#主要财务指标表部分
#读取基本每股收益
  arr_jbmgsy = zycwzb_lines[1]
  #取数格式如下:
  #arr_jbmgsy.split(/,/)
  #puts "#{s_report_date}基本每股收益:" + arr_jbmgsy.split(/,/)[1]
  #puts "年报基本每股收益:" + arr_jbmgsy.split(/,/)[2]
 # puts "上年同期基本每股收益:" + arr_jbmgsy.split(/,/)[5]
#读取每股净资产
  arr_mgjzc = zycwzb_lines[2]
#读取每股经营活动产生的现金流量净额
  arr_mgjyxjllje = zycwzb_lines[3]
#读取主营业务收入
  arr_zyywsr = zycwzb_lines[4]
#读取主营业务利润
  arr_mzyywlr = zycwzb_lines[5]
#读取营业利润
  arr_yylr = zycwzb_lines[6]
#读取投资收益
  arr_tzsy = zycwzb_lines[7]
#读取营业外收支净额
  arr_yywszje = zycwzb_lines[8]
#读取利润总额
  arr_lrze = zycwzb_lines[9]
#puts "利润总额:" + arr_lrze.split(/,/)[1]
#读取净利润
  arr_jlr = zycwzb_lines[10]
#读取净利润(扣除非经常性损益后)
  arr_jlrkc = zycwzb_lines[11]
#读取经营活动产生的现金流量净额
  arr_jyhdxjll = zycwzb_lines[12]
#读取现金及现金等价物净增加额
  arr_xjje = zycwzb_lines[13]
#读取总资产
  arr_zzc = zycwzb_lines[14]
#读取流动资产
  arr_ldzc = zycwzb_lines[15]
#读取总负债
  arr_zfz = zycwzb_lines[16]
#读取流动负债
  arr_ldfz = zycwzb_lines[17]  
#读取股东权益
  arr_gdqy = zycwzb_lines[18]  
#读取净资产收益率加权
  arr_jzcsyl = zycwzb_lines[19]  

#资产负债表部分
#读取货币资金
arr_hbzj = zcfzb_lines[1]
#读取应收票据
arr_yspj = zcfzb_lines[6]
#读取应收账款
arr_yszk = zcfzb_lines[7]
#读取存货
arr_ch = zcfzb_lines[20]
#读取在建工程
arr_zjgc = zcfzb_lines[38]
#读取交易性金融资产
arr_jyxjrzc  = zcfzb_lines[4]

#利润表部分
#读取销售费用
arr_xsfy = lrb_lines[21]
#读取管理费用
arr_glfy = lrb_lines[22]
#读取财务费用
arr_cwfy = lrb_lines[23]
#读取主营业务成本
arr_zyywcb = lrb_lines[9]

八、将读取到的Excel数据逐行写入创建的Excel表

puts "#{s_number}正在写入excel表... ..."
excel = WIN32OLE::new('Excel.Application')
workbook = excel.Workbooks.Open(file_name_path)
worksheet = workbook.Worksheets(1) #定位到第一个sheet
worksheet.Select
#写入
worksheet.Range("a#{x + 3}").Value = "'" + s_number
worksheet.Range("b#{x + 3}").Value = s_name 
#worksheet.Range("d#{x + 3}").Value = s_current_price
#worksheet.Range("f#{x + 3}").Value = s_closing_price
worksheet.Range("g#{x + 3}").Value = s_current_price
worksheet.Range("h#{x + 3}").Value = arr1[44]
worksheet.Range("i#{x + 3}").Value = arr1[39]
worksheet.Range("j#{x + 3}").Value = arr1[46]
worksheet.Range("k#{x + 3}").Value = arr_lrze.split(/,/)[1]
worksheet.Range("l#{x + 3}").Value = arr_lrze.split(/,/)[2]
worksheet.Range("m#{x + 3}").Value = arr_lrze.split(/,/)[5]
worksheet.Range("n#{x + 3}").Value = "=(k#{x + 3}-m#{x + 3})/abs(m#{x + 3})"
worksheet.Range("o#{x + 3}").Value = "=(k#{x + 3}-(l#{x + 3}/4))/abs(m#{x + 3})"
worksheet.Range("p#{x + 3}").Value = arr_jlr.split(/,/)[1]
worksheet.Range("q#{x + 3}").Value = arr_tzsy.split(/,/)[1]
worksheet.Range("r#{x + 3}").Value = arr_yywszje.split(/,/)[1]
worksheet.Range("s#{x + 3}").Value = "=(q#{x + 3}+r#{x + 3})/p#{x + 3}"
worksheet.Range("t#{x + 3}").Value = arr_jlr.split(/,/)[10]  #2014年净利润
worksheet.Range("u#{x + 3}").Value = arr_jlr.split(/,/)[6]  #2015年净利润
worksheet.Range("v#{x + 3}").Value = arr_jlr.split(/,/)[2]  #2016年净利润
worksheet.Range("w#{x + 3}").Value = "=(u#{x + 3}-t#{x + 3})/t#{x + 3}"
worksheet.Range("x#{x + 3}").Value = "=(v#{x + 3}-u#{x + 3})/u#{x + 3}"
worksheet.Range("y#{x + 3}").Value = "=and(t#{x + 3}>0,u#{x + 3}>0,v#{x + 3}>0,x#{x + 3}>w#{x + 3})"
worksheet.Range("z#{x + 3}").Value = arr_ldzc.split(/,/)[2] 
worksheet.Range("aa#{x + 3}").Value = arr_ch.split(/,/)[2] 
worksheet.Range("ab#{x + 3}").Value = arr_hbzj.split(/,/)[2] 
worksheet.Range("ac#{x + 3}").Value = arr_ldfz.split(/,/)[2] 
worksheet.Range("ad#{x + 3}").Value = arr_zzc.split(/,/)[2] 
worksheet.Range("ae#{x + 3}").Value = arr_zfz.split(/,/)[2] 
worksheet.Range("af#{x + 3}").Value = "=z#{x + 3}/ac#{x + 3}"
worksheet.Range("ag#{x + 3}").Value = "=(z#{x + 3}-aa#{x + 3})/ac#{x + 3}"
worksheet.Range("ah#{x + 3}").Value = "=(ab#{x + 3}+ax#{x + 3})/ac#{x + 3}"
worksheet.Range("ai#{x + 3}").Value = "=ad#{x + 3}/ae#{x + 3}"
worksheet.Range("aj#{x + 3}").Value = arr_xsfy.split(/,/)[2] 
worksheet.Range("ak#{x + 3}").Value = arr_glfy.split(/,/)[2] 
worksheet.Range("al#{x + 3}").Value = arr_cwfy.split(/,/)[2] 
worksheet.Range("am#{x + 3}").Value = "=v#{x + 3}/ay#{x + 3}"
worksheet.Range("an#{x + 3}").Value = "=(ay#{x + 3}-ba#{x + 3})/ay#{x + 3}"

worksheet.Range("ap#{x + 3}").Value = arr_jyhdxjll.split(/,/)[2] 
worksheet.Range("aq#{x + 3}").Value = "=aa#{x + 3}/v#{x + 3}"


worksheet.Range("ax#{x + 3}").Value = arr_jyxjrzc.split(/,/)[2] 
worksheet.Range("ay#{x + 3}").Value = arr_zyywsr.split(/,/)[2] 
worksheet.Range("az#{x + 3}").Value = arr_tzsy.split(/,/)[2]
worksheet.Range("ba#{x + 3}").Value = arr_zyywcb.split(/,/)[2]
worksheet.Range("bb#{x + 3}").Value = arr_zyywsr.split(/,/)[1]
worksheet.Range("bc#{x + 3}").Value = arr_zyywsr.split(/,/)[6]
worksheet.Range("bd#{x + 3}").Value = arr_zyywsr.split(/,/)[10]
worksheet.Range("be#{x + 3}").Value = arr_zzc.split(/,/)[1]
worksheet.Range("bf#{x + 3}").Value = arr_zzc.split(/,/)[2]
worksheet.Range("bg#{x + 3}").Value = arr_zzc.split(/,/)[6]
worksheet.Range("bh#{x + 3}").Value = arr_zzc.split(/,/)[10]
worksheet.Range("bi#{x + 3}").Value = arr_zjgc.split(/,/)[1]
worksheet.Range("bj#{x + 3}").Value = arr_zjgc.split(/,/)[2]
worksheet.Range("bk#{x + 3}").Value = arr_zjgc.split(/,/)[6]
worksheet.Range("bl#{x + 3}").Value = arr_zjgc.split(/,/)[10]

workbook.Close(1)
excel.Quit

循环结束,记得用x = x+1和end。
实际上,此类指标很多收费软件都可以实现,比如wind,但是wind动辄六万块的年费,不是一般爱好者能玩的。
通过学习编程来实现,既熟悉了数据的来龙去脉,又能把感兴趣的指标分析出来,何乐不为?

程序执行过程:


程序最终执行结果:

上一篇下一篇

猜你喜欢

热点阅读