程序员

利用主键快速聚合单表数千万MySQL的非索引字段

2017-12-27  本文已影响0人  Codefor

场景

很多时候,为了写入效率,在生产环境里业务大表(单表千万行以上)是不允许随意加索引的。而且就算加索引,因为锁表问题,对业务也是有影响的。

我们一般会用离线的从库进行一些数据统计,而生产环境的索引并不能很好的满足统计的需求。没有相应索引,我们又如何高效的进行字段聚合呢?

利用主键索引进行range,然后再进行聚合。

举例

表名:tbl_pay_orders

行数:29,311,362

引擎:InnoDB

字段:

+------------------------+---------------------+------+-----+---------+----------------+

| Field                  | Type                | Null | Key | Default | Extra          |

+------------------------+---------------------+------+-----+---------+----------------+

| id                    | bigint(20)          | NO  | PRI | NULL    | auto_increment |

| amt                    | bigint(20)          | NO  | | NULL    | |

| created_time                    | int(11)          | NO  | | NULL    | |

需求:按天统计tbl_pay_orders金额

不考虑索引的情况下,SQL像这样写:

select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;

由于created_time没有索引,MySQL 索引提示如下:

mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

| id | select_type | table                  | type | possible_keys | key  | key_len | ref  | rows | Extra                          |

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

|  1 | SIMPLE      | tbl_pay_orders | ALL  | NULL          | NULL | NULL    | NULL | 29311362 | Using temporary; Using filesort |

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

1 row in set (0.00 sec)

全表扫描,共29311362行。

那我们如果利用主键呢?SQL可能像这样:

select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;

索引提示是这样的:

mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

| id | select_type | table                  | type  | possible_keys | key    | key_len | ref  | rows    | Extra                                        |

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

|  1 | SIMPLE      | tbl_pay_orders | range | PRIMARY      | PRIMARY | 8      | NULL | 1879550 | Using where; Using temporary; Using filesort |

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

1 row in set (0.01 sec)

我们发现,仍然没有走任何索引(当然了,因为我们并没有改变索引),但是扫描的行数一下子降到了1879550了。在这个量级我们就可以用MySQL方便的做聚合了

问题来了,我怎么知道每天的id范围呢?

答案是离线先按天建索引,生成一个day到start_id的映射关系。

for i in file("idx.txt"):

    last_rid, last_day = i.strip().split(",")

wf = open('idx.txt', 'a')

sql = "select id,left(from_unixtime(created_time), 10) as day from tbl_pay_orders where id > %s and id < %s" % (last_rid, int(last_rid) + 1000000)

items = dao.select_sql(sql)

for item in items:

    item["day"] = item["day"].replace("-", "").replace(" ", "")

    if item["day"] != last_day:

        wf.write("%s,%s\n" % (item["id"], item["day"]))

        last_day = item["day"]

上一篇下一篇

猜你喜欢

热点阅读