CFBMC-3996:BMC 13.10P1でHBTが確立されなかったためにFAS8300ノードがリブートする
問題
-Node reboots due to stopped / missed heartbeat
[[Node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.] [[Node-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED] [[Node-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED] [Node-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes. [Node-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)-IPMI_KCS_ERR messages observed at the timestamp of the reboot in SKTRACE log
[2024-03-10T01:30:58Z 2180899785867098 [5:0] IPMI_KCS_ERR: kcs_start_write: cmd 0x31 nf 0x36 state 3 not write] [2024-03-10T01:30:58Z 2180899785870130 [5:0] IPMI_KCS_ERR: KCS cmd 0x31 nf 0x36: Failed to start write] [2024-03-10T01:30:59Z 2180900784460092 [15:0] IPMI_KCS_ERR: kcs_error: cmd 0x31 nf 0x36 IBF not 0] [2024-03-10T01:30:59Z 2180901778714878 [18:0] IPMI_KCS_ERR: kcs_error abort: cmd 0x31 nf 0x36 IBF not 0] [2024-03-10T01:31:00Z 2180902760811516 [18:0] IPMI_KCS_ERR: kcs_error cmd 0x31 nf 0x36 not idle] [2024-03-10T01:31:00Z 2180903779141166 [2:0] IPMI_KCS_ERR: kcs_error: cmd 0x31 nf 0x36 IBF not 0]-Node reboots and comes back online