一次MYSQL频繁宕机的排查

2024-09-12  本文已影响0人  数齐

背景

近期,兄弟部门反馈关于我们产品自带的MYSQL数据库的问题。他们指出,数据库在每天凌晨12点会出现宕机现象。这一现象也是最新一段才出现的,这段时间只是增加了一些自动化程序。频繁宕机的问题影响了兄弟部门的工作流程,也产生了一定的负面影响。为了解决这一问题,我们需要深入调查并给出解决措施。

排查过程

确认MYSQL所在的服务器资源情况

我们了解到这段时间做的变更中提到增加了一些自动化的程序,是不是自动化程序中涉及到操作数据库,造成系统资源不足,被系统杀掉?于是查看各类系统资源。无论是磁盘还是内存都有较大剩余,也查看了部分系统日志,均没有因为资源不足杀进程的日志,所以应该不是此原因导致宕机。

结论:不是因为系统资源导致MYSQL宕机

排查MYSQL错误日志

在my.ini中指定当mysql发生故障时的的错误日志地址,具体配置方式如下:

# start error log

log-error=mysql.err

mysql.err内容如下

2024-01-08  9:30:46 0 [Note] InnoDB: Uses event mutexes

2024-01-08  9:30:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2024-01-08  9:30:46 0 [Note] InnoDB: Number of pools: 1

2024-01-08  9:30:46 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2024-01-08  9:30:46 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)

2024-01-08  9:30:46 0 [Note] InnoDB: Using Linux native AIO

2024-01-08  9:30:46 0 [Note] InnoDB: Initializing buffer pool, total size = 536870912, chunk size = 134217728

2024-01-08  9:30:46 0 [Note] InnoDB: Completed initialization of buffer pool

2024-01-08  9:30:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=296938092837,296938109566

2024-01-08  9:30:47 0 [Note] InnoDB: Transaction 57863035 was in the XA prepared state.

2024-01-08  9:30:47 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2024-01-08  9:30:47 0 [Note] InnoDB: Trx id counter is 57863036

2024-01-08  9:30:47 0 [Note] InnoDB: Starting final batch to recover 54726 pages from redo log.

2024-01-08  9:30:49 0 [Note] InnoDB: Last binlog file './mysql-bin.000078', position 111348635

2024-01-08  9:30:49 0 [Note] InnoDB: 128 rollback segments are active.

2024-01-08  9:30:49 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2024-01-08  9:30:49 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2024-01-08  9:30:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2024-01-08  9:30:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2024-01-08  9:30:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2024-01-08  9:30:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2024-01-08  9:30:49 0 [Note] InnoDB: 10.5.11 started; log sequence number 297037605262; transaction id 57863037

2024-01-08  9:30:49 0 [Note] InnoDB: Loading buffer pool(s) from /home/hsrisk/disk01/see/mysql/data/ib_buffer_pool

2024-01-08  9:30:49 0 [Note] Plugin 'FEEDBACK' is disabled.

2024-01-08  9:30:49 0 [Note] Recovering after a crash using mysql-bin

2024-01-08  9:30:49 0 [Note] InnoDB: Buffer pool(s) load completed at 240108  9:30:49

2024-01-08  9:30:49 0 [Note] Starting crash recovery...

2024-01-08  9:30:49 0 [Note] InnoDB: Starting recovery for XA transactions...

2024-01-08  9:30:49 0 [Note] InnoDB: Transaction 57863035 in prepared state after recovery

2024-01-08  9:30:49 0 [Note] InnoDB: Transaction contains changes to 1 rows

2024-01-08  9:30:49 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2024-01-08  9:30:49 0 [Note] Found 1 prepared transaction(s) in InnoDB

2024-01-08  9:30:49 0 [Note] Crash recovery finished.

2024-01-08  9:30:49 0 [Note] Server socket created on IP: '::'.

2024-01-08  9:30:49 0 [Note] Reading of all Master_info entries succeeded

2024-01-08  9:30:49 0 [Note] Added new Master_info '' to hash table

2024-01-08  9:30:49 0 [Note] ./mysqld: ready for connections.

Version: '10.5.11-MariaDB-log'  socket: '/tmp/see20/mysql.sock'  port: 3306  MariaDB Server

这段日志是客户因为数据库宕机后手动重启的日志。这个日志中发现有Recovering after a crash using mysql-bin,Starting crash recovery,Crash recovery finished 等字样,表示数据库之前确实是异常关闭的,但是这种异常是MYSQL SERVER预料之外的。预料之外的含义是这种异常数据库未定义,如果是意料之中的异常的话,日志中会留下更多的信息,例如下图,虽然没写明是什么原因导致的数据库崩溃,但是他清楚的告诉了报错的地点以及保留了帮助定位的信息。

2023-09-17 21:36:12 0x7f384413a700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/lock/lock0lock.cc line 7014

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/

InnoDB: about forcing recovery.

230917 21:36:12 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.3.10-MariaDB-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=322

max_threads=6002

thread_count=261

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 13325602 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f35980009a8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f3844139e30 thread_stack 0x49000

mysys/stacktrace.c:270(my_print_stacktrace)[0x556c2c650b4b]

sql/signal_handler.cc:168(handle_fatal_signal)[0x556c2c13bba7]

sigaction.c:0(__restore_rt)[0x7f3849863680]

:0(__GI_raise)[0x7f384866f207]

:0(__GI_abort)[0x7f38486708f8]

ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x556c2bea97b9]

lock/lock0lock.cc:7014(DeadlockChecker::check_and_resolve(ib_lock_t const*, trx_t*))[0x556c2c319ec2]

lock/lock0lock.cc:1741(lock_rec_enqueue_waiting(ib_lock_t*, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*, lock_prdt*))[0x556c2c31aaf4]

lock/lock0lock.cc:1990(lock_rec_lock(bool, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*))[0x556c2c31df48]

row/row0sel.cc:1284(sel_set_rec_lock(btr_pcur_t*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, que_thr_t*, mtr_t*))[0x556c2c3b783c]

row/row0sel.cc:5047(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x556c2c3b8a10]

handler/ha_innodb.cc:9264(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x556c2c2e11fc]

sql/handler.cc:2798(handler::ha_rnd_pos(unsigned char*, unsigned char*))[0x556c2c140c82]

sql/handler.h:3568(handler::rnd_pos_by_record(unsigned char*))[0x556c2c14855b]

sql/sql_class.h:6384(handler::ha_rnd_pos_by_record(unsigned char*))[0x556c2c219170]

sql/log_event.cc:14265(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x556c2c2194f4]

sql/log_event.cc:11514(Rows_log_event::do_apply_event(rpl_group_info*))[0x556c2c20ddcc]

sql/log_event.h:5159(Log_event::apply_event(rpl_group_info*))[0x556c2bef5c5c]

sql/slave.cc:3920(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x556c2bee83db]

sql/mysqld.h:808(thread_safe_increment64)[0x556c2c09073e]

sql/rpl_parallel.cc:1300(handle_rpl_parallel_thread)[0x556c2c0931fb]

pthread_create.c:0(start_thread)[0x7f384985bdd5]

*** stack smashing detected ***: /home/hundsun/mysql/mysql/bin/mysqld terminated

结论:是异常宕机,并且该异常不是系统定义,无法被MYSQL SERVER所捕获。

排查MYSQL BINLOG日志

在my.ini中指定当开启binlog日志,具体配置方式如下:

log-bin=mysql-bin  #开启binlog

重启MYSQL后会创建如下格式的binlog文件

mysql-bin.000001

mysql-bin.000002

MYSQL生成的binlog是二进制文件,无法直接查看。可以使用mysqlbinlog工具将binlog二进制数据导出进行查看,例如

./mysqlbinlog ../data/mysql-bin.000004 --start-datetime='2020-02-12 23:30:00' --stop-datetime='2025-02-12:23:50:00' --base64-output=DECODE-ROWS  > 4.txt

可以按照时间维度将数据导出到文件中进行查看。下面是2个binlog的导出数据

mysql-bin.000041这个文件有203M,最后的修改时间是2023-12-24 23:59:15。此binlog导出文件的末尾数据如下

#231224 23:59:15 server id 1  end_log_pos 83409660 CRC32 0x595aa790 Table_map: `acm`.`tb_lock` mapped to number 58

# at 83409660

#231224 23:59:15 server id 1  end_log_pos 83410156 CRC32 0xed3ad18a Update_rows: table id 58 flags: STMT_END_F

# Number of rows: 1

# at 83410156

#231224 23:59:15 server id 1  end_log_pos 83410187 CRC32 0x539c3bfa Xid = 1904727

COMMIT/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

mysql-bin.000042这个文件有1G,最后的修改时间是2023-12-25 19:03:03。此binlog导出文件的末尾数据如下

上一篇下一篇

猜你喜欢

热点阅读