有个11204
rac的测试环境,客户反馈凌晨rman全备时偶尔会有内存耗尽导致数据库重启的情况,不是合同内的维护环境,请我们帮忙处理。我估计是没配置vm.min_free_kbytes,之前也调整多次每次都成功完成,就没有多想,直接白天调整了

 机器内存有370G多, 实例sga+pga=260G,我计划配置成预留50G

添加如下配置后,sysctl  -p执行生效

vm.min_free_kbytes = 52428800

几分钟后发现db1不正常了,oraagent .bin负载高了,db1上无法执行查询命令crsctl status res -t 

查看集群日志

2023-06-16 15:14:03.998:
[ohasd(9796)]CRS-2878:Failed to restart resource 'ora.gpnpd'
2023-06-16 15:14:04.056:
[ohasd(9796)]CRS-2878:Failed to restart resource 'ora.mdnsd'
2023-06-16 15:14:07.504:
[gpnpd(15816)]CRS-2328:GPNPD started on node db1.
2023-06-16 15:14:10.523:
[gpnpd(15816)]CRS-2338:Clusterwide GPnP profile updates may be impaired.
2023-06-16 15:14:18.528:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD
Provider" failed with RDE-00023.
 
2023-06-16 15:14:26.529:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD
Provider" failed with RDE-00023.
 
2023-06-16 15:14:34.530:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD
Provider" failed with RDE-00023.
 
2023-06-16 15:14:42.531:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD
Provider" failed with RDE-00023.
 
2023-06-16 15:14:50.532:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD
Provider" failed with RDE-00023.

等待40多分钟还一直这样报错,mos中也找不到类似的案例,实例还正常运行就是集群异常,和客户申请了停机维护,实例可以正常关闭,但是GI无法正常关闭,直接reboot主机了,重启后服务正常

查看系统message,确实在调整vm.min_free_kbytes后有内存不足的报错,还好是测试环境,是个教训,今后谨慎操作

Jun 16 15:08:38 db1 kernel: oracle: page allocation failure: order:0, mode:0x20
Jun 16 15:08:38 db1 kernel: Pid: 16474, comm: oracle Tainted: GF          O
3.8.13-16.2.1.el6uek.x86_64 #1
Jun 16 15:08:38 db1 kernel: Call Trace:
Jun 16 15:08:38 db1 kernel: <IRQ>  [<ffffffff811340a3>]
warn_alloc_failed+0xf3/0x160
Jun 16 15:08:38 db1 kernel: [<ffffffff81048099>] ?
default_spin_lock_flags+0x9/0x10
Jun 16 15:08:38 db1 kernel: [<ffffffff811374b6>]
__alloc_pages_slowpath+0x4a6/0x7b0
Jun 16 15:08:38 db1 kernel: [<ffffffff8113234f>] ? zone_watermark_ok+0x1f/0x30
Jun 16 15:08:38 db1 kernel: [<ffffffff81137abb>]
__alloc_pages_nodemask+0x2fb/0x320
Jun 16 15:08:38 db1 kernel: [<ffffffff81175ea3>]
alloc_pages_current+0xe3/0x1c0
Jun 16 15:08:38 db1 kernel: [<ffffffff814b70e9>]
__netdev_alloc_frag+0x99/0x150
Jun 16 15:08:38 db1 kernel: [<ffffffff814b80aa>] __netdev_alloc_skb+0x9a/0xe0
Jun 16 15:08:38 db1 kernel: [<ffffffffa0560dca>]
igb_fetch_rx_buffer+0x7a/0x1e0 [igb]
Jun 16 15:08:38 db1 kernel: [<ffffffffa0560fd5>] igb_clean_rx_irq+0xa5/0x420
[igb]
Jun 16 15:08:38 db1 kernel: [<ffffffffa0561885>] igb_poll+0x65/0xb0 [igb]
Jun 16 15:08:38 db1 kernel: [<ffffffff814c9985>] net_rx_action+0x105/0x2b0
Jun 16 15:08:38 db1 kernel: [<ffffffff81065e37>] __do_softirq+0xd7/0x240
Jun 16 15:08:38 db1 kernel: [<ffffffff81592aae>] ? _raw_spin_lock+0xe/0x20
Jun 16 15:08:38 db1 kernel: [<ffffffff8159ca9c>] call_softirq+0x1c/0x30
Jun 16 15:08:38 db1 kernel: [<ffffffff810174b5>] do_softirq+0x65/0xa0
Jun 16 15:08:38 db1 kernel: [<ffffffff81065c1d>] irq_exit+0xbd/0xe0
Jun 16 15:08:38 db1 kernel: [<ffffffff8159d666>] do_IRQ+0x66/0xe0
Jun 16 15:08:38 db1 kernel: [<ffffffff811f7e50>] ? sched_open+0x20/0x20
Jun 16 15:08:38 db1 kernel: [<ffffffff815930ad>] common_interrupt+0x6d/0x6d
Jun 16 15:08:38 db1 kernel: <EOI>  [<ffffffff811b2d9f>] ? seq_open+0x4f/0xb0
Jun 16 15:08:38 db1 kernel: [<ffffffff8118e7f9>] ? do_dentry_open+0x259/0x2d0
Jun 16 15:08:38 db1 kernel: [<ffffffff8118e7de>] ? do_dentry_open+0x23e/0x2d0
Jun 16 15:08:38 db1 kernel: [<ffffffff8118e995>] finish_open+0x35/0x50
Jun 16 15:08:38 db1 kernel: [<ffffffff8119db96>] do_last+0x436/0x7b0
Jun 16 15:08:38 db1 kernel: [<ffffffff8119b0d8>] ? inode_permission+0x18/0x50
Jun 16 15:08:38 db1 kernel: [<ffffffff8119e15d>] ? link_path_walk+0x24d/0x420
Jun 16 15:08:38 db1 kernel: [<ffffffff811a0673>] path_openat+0xb3/0x480
Jun 16 15:08:38 db1 kernel: [<ffffffff811a0b79>] do_filp_open+0x49/0xa0
Jun 16 15:08:38 db1 kernel: [<ffffffff81592aae>] ? _raw_spin_lock+0xe/0x20
Jun 16 15:08:38 db1 kernel: [<ffffffff811ad0e5>] ? __alloc_fd+0xb5/0x160
Jun 16 15:08:38 db1 kernel: [<ffffffff8118e448>] do_sys_open+0x108/0x1f0
Jun 16 15:08:38 db1 kernel: [<ffffffff8118e571>] sys_open+0x21/0x30
Jun 16 15:08:38 db1 kernel: [<ffffffff8159b719>]
system_call_fastpath+0x16/0x1b

技术
下载桌面版
GitHub
百度网盘(提取码:draw)
Gitee
云服务器优惠
阿里云优惠券
腾讯云优惠券
华为云优惠券
站点信息
问题反馈
邮箱:ixiaoyang8@qq.com
QQ群:766591547
关注微信