Core Data 数据库升级耗时问题探究
背景:
在部分忠实用户的使用过程中,发现经常在更新新版本的时候会卡在启动页面卡很久,这时候如果直接杀死进程,再重新打开的时候,那么历史数据就全部丢失了。初步判断是由于Core Data 数据库升级的问题。
具体的升级相关的基础知识就不赘述了,直接看官方文档 Core Data Model Versioning and Data Migration
测试数据分析
用户数据
测试账号中数据量:
- User: 37486
- Conversation: 983
- Message: 541910
Message是一张总表
- ClientMessage: 387939
- AssetClientMessage: 81277
- SystemMessage: 72694
测试环境
手机信息: iPhone7 iOS14.4
Model版本:2.40.28
新增字段耗时
- Message表 新增 默认值为0的Inte32字段 ,Conversation新增 默认值为0的Int32字段
Migration step 0.0 'Total migration time (on connection)' took 209.34 seconds
Migration step 2.0 'Total formal transaction time' took 209.30 seconds
Migration step 2.4 'Drop indices' took 11.17 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 137.67 seconds
Migration step 2.7.1 'Update default values' took 3.06 seconds
Migration step 2.17 'Time for COMMIT' took 57.35 seconds
- ClientMessage表新增 默认值为0的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 78.73 seconds
Migration step 2.0 'Total formal transaction time' took 78.70 seconds
Migration step 2.4 'Drop indices' took 0.40 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 13.70 seconds
Migration step 2.7.1 'Update default values' took 27.44 seconds
Migration step 2.17 'Time for COMMIT' took 37.13 seconds
Migration step 3.0 'Checkpoint time' took 0.02 seconds
- ClientMessage表移除索引,新增 默认值为0的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 80.03 seconds
Migration step 2.0 'Total formal transaction time' took 80.00 seconds
Migration step 2.4 'Drop indices' took 0.37 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 0.02 seconds
Migration step 2.7.1 'Update default values' took 33.08 seconds
Migration step 2.17 'Time for COMMIT' took 46.50 seconds
Migration step 3.0 'Checkpoint time' took 0.02 seconds
- ClientMessage表移除索引,新增 无默认值的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 0.90 seconds
Migration step 2.0 'Total formal transaction time' took 0.05 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 0.01 seconds
Migration step 3.0 'Checkpoint time' took 0.85 seconds
- Conversation表,新增 默认值为0的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 9.89 seconds
Migration step 2.0 'Total formal transaction time' took 9.89 seconds
Migration step 2.4 'Drop indices' took 0.02 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 1.93 seconds
Migration step 2.7.1 'Update default values' took 0.45 seconds
Migration step 2.17 'Time for COMMIT' took 7.45 seconds
- Conversation表,新增 无默认值的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 1.96 seconds
Migration step 2.0 'Total formal transaction time' took 1.87 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 1.81 seconds
Migration step 3.0 'Checkpoint time' took 0.08 seconds
- User表,新增 无默认值的Inte32字段
Migration step 0.0 'Total migration time (on connection)' took 7.42 seconds
Migration step 2.0 'Total formal transaction time' took 7.42 seconds
Migration step 2.4 'Drop indices' took 0.12 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 1.00 seconds
Migration step 2.17 'Time for COMMIT' took 6.27 seconds
- Conversation表,ClientMessage表,GenericMessageData表,删除字段
Migration step 0.0 'Total migration time (on connection)' took 263.65 seconds
Migration step 2.0 'Total formal transaction time' took 263.55 seconds
Migration step 2.4 'Drop indices' took 0.01 seconds
Migration step 2.5 'Execution of entity schema and data migration statements' took 166.68 seconds
Migration step 2.17 'Time for COMMIT' took 96.81 seconds
根据上面测试结果分析
- 将要在Message表中新增的字段,放在ClientMessage表中会快很多
- 索引也是影响时间的一个因素
- 不要设置带有默认值的字段
- 不要删除字段
原理探究
在Edit Scheme中的Run里的Arguments 新增一行
-com.apple.CoreData.MigrationDebug 1
就可以看到Core Data 升级时的数据库操作流程
我在异步线程开启了一个定时器,每1s中打印一次日志,这样结合 Core Data 的log就可以看出每一步所耗的时间了。`
下面是一些耗时步骤的SQL输出(做了简化处理)
- 删除表中的索引
CoreData: sql: DROP INDEX IF EXISTS Z_Message_serverTimestamp
大约耗时1s
...//基本message表的每个索引都要耗时1s
CoreData: sql: DROP INDEX IF EXISTS Z_Message_normalizedText
大约耗时7s // 这个索引耗时最久
CoreData: sql: DROP INDEX IF EXISTS Z_ClientMessage_linkPreviewState
大约耗时1s
- 非耗时操作,但是对于下面的步骤很重要
CoreData: sql: ALTER TABLE ZMESSAGE RENAME TO _T_ZMESSAGE
CoreData: sql: ALTER TABLE ZCONVERSATION ADD COLUMN ZISTEST INTEGER
CoreData: sql: CREATE TABLE ZMESSAGE
- 将_T_ZMESSAGE表中的数据插入到ZMESSAGE表中
CoreData: sql: INSERT INTO ZMESSAGE ... FROM _T_ZMESSAGE WHERE _T_ZMESSAGE.Z_ENT = 9
大约耗时28s
CoreData: sql: INSERT INTO ZMESSAGE ... FROM _T_ZMESSAGE WHERE _T_ZMESSAGE.Z_ENT = 13
大约耗时22s
CoreData: sql: INSERT INTO ZMESSAGE ... FROM _T_ZMESSAGE WHERE _T_ZMESSAGE.Z_ENT = 8
大约耗时17s
- 删除_T_ZMESSAGE表
DROP TABLE _T_ZMESSAGE
大约耗时41s
- 重新创建索引
CREATE INDEX IF NOT EXISTS Z_Conversation_remoteIdentifier_data ON ZCONVERSATION
大约耗时3s
CREATE INDEX IF NOT EXISTS ZMESSAGE_ZHIDDENINCONVERSATION_INDEX ON ZMESSAGE
大约耗时3s
CREATE INDEX IF NOT EXISTS Z_Message_normalizedText ON ZMESSAGE
大约耗时5s
CREATE INDEX IF NOT EXISTS Z_Message_nonce_data ON ZMESSAGE
大约耗时4s
- 给字段更新默认值
UPDATE ZCONVERSATION SET ZISTEST = ? WHERE ZCONVERSATION.Z_ENT = 3
大约耗时3s
- 事务提交
Committing formal transaction
大约耗时57s
从上面的一些耗时步骤分析得到
-
如果给Message表 添加字段就会造成
- Message 表中的索引被删除
- 重命名ZMESSAGE为_T_ZMESSAGFE,并创建新表ZMESSAGE
- 将_T_ZMESSAGFE中数据插入到ZMESSAGE中
- 删除_T_ZMESSAGFE
- 重新创建索引
- 事务提交
可以看到以上几个步骤除了重命名旧表,创建新表不耗时,其他都非常耗时。
这是因为从代码层面上来说 Message在我们的Core Data 中是Abstract Entity,即抽象父类,而ClientMessage,AssetClientMessage,SystemMessage都是它的子类,Core Data 表结构上看似乎每个子类都是单独的一张表,但是在真正的数据库中,只有一张ZMESSAGE表,而这些子类都是通过表中的字段 Z_ENT(这是Core Data 自动生成的索引)这个值来区分的。(所以上面将旧表数据导入至新表的时候,插入了不止一次)
所以我们在使用Core Data 的时候尽量不要使用这种继承的方式,或者可以使用仅代码中的Class实现继承,但是表结构不要使用继承。否则在数据量大的情况下就会产生性能瓶颈
-
表删除一个字段
任意表的删除字段操作,其升级过程都是重命名旧表,创建新表,将旧表数据导入新表,所以涉及大数据量下的表的字段删除需要慎重 -
数据量大的表移除索引也是比较耗时的操作
如果只是简单的添加一个属性,数据库应该只需要 ADD COLUMN 即可,为什么还要先移除索引?(这还等待研究) -
设置默认值,会造成数据库需要将以前的旧数据全部更新一遍,肯定非常耗时
所以直接使用默认的nil值吧 -
重新创建索引
这里和上面创建索引一样,还得研究下为什么需要这么麻烦? -
事务提交
对数据库做了改动,改动越大,提交时间越长
遗留问题
可以看到 给User新增字段的时候,耗时了7s,分析日志发现是走了创新新表的过程,通过对比测试
最终发现是因为 Message 表中有一个 To Many 关系的 recipientUsers 字段,是用来存储User的数组,并且没有 Inverse,只是一个单向的关系
Xnip2021-09-06_10-58-10.jpg如果将这个字段删除,那么User表新增字段就不会走重新创建的过程了。
由于Core Data 的内部具体的升级代码无从得知,所以也无法解释这个问题,有哪位知晓的话,还望赐教
结束
Core Data 的使用让我们通过更加面向对象的方式实现了数据的存取,但是在遇到一些性能问题的时候,由于Core Data 内部做了许多优化,所以也让我们定位问题变得更难。
不过就算使用了Core Data,也千万不要用表继承!!!