python量化交易3——抓取股票的日K数据

2019-04-09 本文已影响0人德尔璐

一、在cmd中开启MongoDB

在目录G:\MongoDB\bin的地址栏输入cmd：

输入mongod.exe --dbpath=G:\mongoDB\data

新开一个cmd，输入mongo.exe

创建索引：

输入db.daily.createIndex({'code':1,'date':1,'index':1},{'background':true})

db.daily_hfq.createIndex({'code':1,'date':1,'index':1},{'background':true})

代码如下：

class DailyCrawler:

def __init__(self):

"""

初始化

"""

# 创建daily数据集

self.daily = DB_CONN['daily']

# 创建daily_hfq数据集

self.daily_hfq = DB_CONN['daily_hfq']

def crawl(self, begin_date=None, end_date=None):

"""

抓取股票的日K数据，主要包含了不复权和后复权两种

:param begin_date: 开始日期

:param end_date: 结束日期

"""

# 通过tushare的基本信息API，获取所有股票的基本信息

stock_df = ts.get_stock_basics()

# 将基本信息的索引列表转化为股票代码列表

codes = list(stock_df.index)

# 当前日期

now = datetime.now().strftime('%Y-%m-%d')

# 如果没有指定开始日期，则默认为当前日期

if begin_date is None:

begin_date = now

# 如果没有指定结束日期，则默认为当前日期

if end_date is None:

end_date = now

for code in codes:

# 抓取不复权的价格

df_daily = ts.get_k_data(code, autype=None, start=begin_date, end=end_date)

self.save_data(code, df_daily, self.daily, {'index': False})

# 抓取后复权的价格

df_daily_hfq = ts.get_k_data(code, autype='hfq', start=begin_date, end=end_date)

self.save_data(code, df_daily_hfq, self.daily_hfq, {'index': False})

def save_data(self, code, df_daily, collection, extra_fields=None):

"""

将从网上抓取的数据保存到本地MongoDB中

:param code: 股票代码

:param df_daily: 包含日线数据的DataFrame

:param collection: 要保存的数据集

:param extra_fields: 除了K线数据中保存的字段，需要额外保存的字段

"""

# 数据更新的请求列表

update_requests = []

# 将DataFrame中的行情数据，生成更新数据的请求

for df_index in df_daily.index:

# 将DataFrame中的一行数据转dict

doc = dict(df_daily.loc[df_index])

# 设置股票代码

doc['code'] = code

# 如果指定了其他字段，则更新dict

if extra_fields is not None:

doc.update(extra_fields)

# 生成一条数据库的更新请求

# 注意：

# 需要在code、date、index三个字段上增加索引，否则随着数据量的增加，

# 写入速度会变慢，需要创建索引。创建索引需要在MongoDB-shell中执行命令式：

# db.daily.createIndex({'code':1,'date':1,'index':1},{'background':true})

# db.daily_hfq.createIndex({'code':1,'date':1,'index':1},{'background':true})

update_requests.append(

UpdateOne(

{'code': doc['code'], 'date': doc['date'], 'index': doc['index']},

{'$set': doc},

upsert=True)

)

# 如果写入的请求列表不为空，则保存都数据库中

if len(update_requests) > 0:

# 批量写入到数据库中，批量写入可以降低网络IO，提高速度

update_result = collection.bulk_write(update_requests, ordered=False)

print('保存日线数据，代码： %s, 插入：%4d条, 更新：%4d条' %

(code, update_result.upserted_count, update_result.modified_count),

flush=True)

if __name__ == '__main__':

dc = DailyCrawler()

dc.crawl('2019-04-08')

执行后：

代码截图：

python量化交易3——抓取股票的日K数据

猜你喜欢

热点阅读