problem: the ceph cluster osd has been turned down, and restarting osd has been failing.
analysis:
[root@shnode183 ~]# systemctl status ceph-osd@14
● [email protected] - Ceph object storage daemon osd.14
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled-runtime; vendor preset: disabled)
Active: failed (Result: start-limit) since Mon 2020-06-08 17:47:25 CST; 2s ago
Process: 291595 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Process: 291589 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 291595 (code=exited, status=1/FAILURE)
Jun 08 17:47:25 shnode183 systemd[1]: Unit [email protected] entered failed state.
Jun 08 17:47:25 shnode183 systemd[1]: [email protected] failed.
Jun 08 17:47:25 shnode183 systemd[1]: [email protected] holdoff time over, scheduling restart.
Jun 08 17:47:25 shnode183 systemd[1]: Stopped Ceph object storage daemon osd.14.
Jun 08 17:47:25 shnode183 systemd[1]: start request repeated too quickly for [email protected]
Jun 08 17:47:25 shnode183 systemd[1]: Failed to start Ceph object storage daemon osd.14.
Jun 08 17:47:25 shnode183 systemd[1]: Unit [email protected] entered failed state.
Jun 08 17:47:25 shnode183 systemd[1]: [email protected] failed.
[root@shnode183 ~]# tail /var/log/ceph/ceph-osd.14.log
2020-06-08 17:47:25.091 7f8f9d863a80 0 set uid:gid to 167:167 (ceph:ceph)
2020-06-08 17:47:25.091 7f8f9d863a80 0 ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable), process ceph-osd, pid 291575
2020-06-08 17:47:25.091 7f8f9d863a80 0 pidfile_write: ignore empty --pid-file
2020-06-08 17:47:25.114 7f8f9d863a80 -1 bluestore(/var/lib/ceph/osd/ceph-14/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-14/block: (5) Input/output error
2020-06-08 17:47:25.114 7f8f9d863a80 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-14: (2) No such file or directory
2020-06-08 17:47:25.343 7f826fb16a80 0 set uid:gid to 167:167 (ceph:ceph)
2020-06-08 17:47:25.343 7f826fb16a80 0 ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable), process ceph-osd, pid 291595
2020-06-08 17:47:25.343 7f826fb16a80 0 pidfile_write: ignore empty --pid-file
2020-06-08 17:47:25.366 7f826fb16a80 -1 bluestore(/var/lib/ceph/osd/ceph-14/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-14/block: (5) Input/output error
2020-06-08 17:47:25.366 7f826fb16a80 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-14: (2) No such file or directory
[root@shnode183 ~]# dmesg -T
[Tue Jun 2 04:07:26 2020] sd 0:2:1:0: [sdb] tag#10 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:26 2020] sd 0:2:1:0: [sdb] tag#10 CDB: Read(16) 88 00 00 00 00 02 fc 7f 41 80 00 00 02 00 00 00
[Tue Jun 2 04:07:26 2020] print_req_error: I/O error, dev sdb, sector 12826132864
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#31 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#43 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#31 CDB: Write(16) 8a 00 00 00 00 02 18 71 bc d0 00 00 00 10 00 00
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#43 CDB: Read(16) 88 00 00 00 00 02 bf 09 53 80 00 00 02 00 00 00
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 11794994048
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 9000041680
[Tue Jun 2 04:07:30 2020] Buffer I/O error on dev dm-1, logical block 1125004954, lost async page write
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 10183874816
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#17 CDB: Read(16) 88 00 00 00 00 00 5a 16 d3 80 00 00 00 48 00 00
[Tue Jun 2 04:07:30 2020] Buffer I/O error on dev dm-1, logical block 1125004955, lost async page write
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 1511445376
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#18 CDB: Read(16) 88 00 00 00 00 02 19 12 0f 00 00 00 00 10 00 00
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 9010548480
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#8 CDB: Read(16) 88 00 00 00 00 02 19 e7 83 80 00 00 00 10 00 00
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#44 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 9024537472
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#44 CDB: Read(16) 88 00 00 00 00 02 bf 09 55 80 00 00 01 e8 00 00
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 11794994560
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#8 CDB: Read(16) 88 00 00 00 00 02 19 12 0f 00 00 00 00 08 00 00
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 9010548480
[Tue Jun 2 04:07:30 2020] sd 0:2:1:0: [sdb] tag#13 CDB: Read(16) 88 00 00 00 00 02 19 e7 83 80 00 00 00 08 00 00
[Tue Jun 2 04:07:30 2020] print_req_error: I/O error, dev sdb, sector 9024537472
[Tue Jun 2 04:07:30 2020] Buffer I/O error on dev dm-1, logical block 1126318304, async page read
[Tue Jun 2 04:07:30 2020] Buffer I/O error on dev dm-1, logical block 1128066928, async page read
[root@shnode183 ~]# pvs
Error reading device /dev/sdb at 0 length 512.
Error reading device /dev/sdb at 0 length 4.
Error reading device /dev/sdb at 4096 length 4.
PV VG Fmt Attr PSize PFree
/dev/sdb ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05 lvm2 a-- 8.73t 0
/dev/sdc ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd lvm2 a-- 8.73t 0
/dev/sdd ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 lvm2 a-- 8.73t 0
/dev/sde ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 lvm2 a-- 8.73t 0
/dev/sdf ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 lvm2 a-- 8.73t 0
/dev/sdg ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f lvm2 a-- <6.55t 0
/dev/sdh ceph-75b81cc4-095c-4281-8b26-222a7e669d09 lvm2 a-- <6.55t 0
[root@shnode183 ~]# hpssacli ctrl slot=0 show config detail
-bash: hpssacli: command not found
You have new mail in /var/spool/mail/root
[root@shnode183 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05 1 1 0 wz--n- 8.73t 0
ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 1 1 0 wz--n- 8.73t 0
ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 1 1 0 wz--n- 8.73t 0
ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f 1 1 0 wz--n- <6.55t 0
ceph-75b81cc4-095c-4281-8b26-222a7e669d09 1 1 0 wz--n- <6.55t 0
ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 1 1 0 wz--n- 8.73t 0
ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd 1 1 0 wz--n- 8.73t 0
You have new mail in /var/spool/mail/root
troubleshooting found that the disk /dev/sdb is damaged and needs to be replaced. Clear /dev/sdb logical volume information
[root@shnode183 ~]# df -h|grep ceph
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-15
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-17
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-20
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-16
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-19
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-18
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-14
[root@shnode183 ~]# umount /var/lib/ceph/osd/ceph-14
[root@shnode183 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93 ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05 -wi-a----- 8.73t
osd-block-091f4915-e79f-43fa-b40d-89f3cdf1cf4f ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 -wi-ao---- 8.73t
osd-block-f52a5fbd-e4ac-41dd-869c-f25b7867b726 ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 -wi-ao---- 8.73t
osd-block-8035bf12-6a30-4a57-910e-ddf7e7f319cd ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f -wi-ao---- <6.55t
osd-block-882e7034-f5d2-480d-be60-3e7c8a746f1b ceph-75b81cc4-095c-4281-8b26-222a7e669d09 -wi-ao---- <6.55t
osd-block-1fbd5079-51bd-479b-9e2e-80a3264f46ba ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 -wi-ao---- 8.73t
osd-block-0a09de8e-354f-407e-a57e-cb346d8cac6c ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd -wi-ao---- 8.73t
[root@shnode183 ~]# pvs
Error reading device /dev/sdb at 0 length 512.
Error reading device /dev/sdb at 0 length 4.
Error reading device /dev/sdb at 4096 length 4.
PV VG Fmt Attr PSize PFree
/dev/sdb ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05 lvm2 a-- 8.73t 0
/dev/sdc ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd lvm2 a-- 8.73t 0
/dev/sdd ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 lvm2 a-- 8.73t 0
/dev/sde ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 lvm2 a-- 8.73t 0
/dev/sdf ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 lvm2 a-- 8.73t 0
/dev/sdg ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f lvm2 a-- <6.55t 0
/dev/sdh ceph-75b81cc4-095c-4281-8b26-222a7e669d09 lvm2 a-- <6.55t 0
[root@shnode183 ~]# lvremove osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93/ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05
Volume group "osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93" not found
Cannot process volume group osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93
[root@shnode183 ~]# lvremove ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05/osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93
Do you really want to remove active logical volume ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05/osd-block-fbd4f71a-9ada-4fbd-b87f-9d1f4f9dab93?[y/n]: y
Error reading device /dev/sdb at 4096 length 512.
Failed to read metadata area header on /dev/sdb at 4096
WARNING: Failed to write an MDA of VG ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05.
Failed to write VG ceph-0a213fb7-3bdd-49fc-904c-9aecf750ef05.
cannot be deleted at this point, you need to refresh the cache with the following command.
[root@shnode183 ~]# pvscan --cache
You have new mail in /var/spool/mail/root
[root@shnode183 ~]# pvs
Error reading device /dev/sdb at 0 length 512.
Error reading device /dev/sdb at 0 length 4.
Error reading device /dev/sdb at 4096 length 4.
PV VG Fmt Attr PSize PFree
/dev/sdc ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd lvm2 a-- 8.73t 0
/dev/sdd ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 lvm2 a-- 8.73t 0
/dev/sde ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 lvm2 a-- 8.73t 0
/dev/sdf ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 lvm2 a-- 8.73t 0
/dev/sdg ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f lvm2 a-- <6.55t 0
/dev/sdh ceph-75b81cc4-095c-4281-8b26-222a7e669d09 lvm2 a-- <6.55t 0
[root@shnode183 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
osd-block-091f4915-e79f-43fa-b40d-89f3cdf1cf4f ceph-22bbd5e1-f98d-40a2-950d-023a08ba5eb3 -wi-ao---- 8.73t
osd-block-f52a5fbd-e4ac-41dd-869c-f25b7867b726 ceph-36c57ac2-0724-4f6f-bdb0-020cd18d0643 -wi-ao---- 8.73t
osd-block-8035bf12-6a30-4a57-910e-ddf7e7f319cd ceph-52d9bdc0-f9f0-4659-83d4-4b6cc80e387f -wi-ao---- <6.55t
osd-block-882e7034-f5d2-480d-be60-3e7c8a746f1b ceph-75b81cc4-095c-4281-8b26-222a7e669d09 -wi-ao---- <6.55t
osd-block-1fbd5079-51bd-479b-9e2e-80a3264f46ba ceph-b1df4cad-fc0e-430a-8a2b-8fd08ce4cb62 -wi-ao---- 8.73t
osd-block-0a09de8e-354f-407e-a57e-cb346d8cac6c ceph-bf6136eb-671c-44ee-aa24-9a460c2901bd -wi-ao---- 8.73t
you can see that the corrupted logical volume has disappeared.
Read More:
- Solve the problem that CEPH dashboard cannot be accessed
- Solutions to ADB failed to start daemon
- SSH suddenly fails to log in, and an error is reported: failed to start openssh daemon
- Doris query task failed to initialize storage reader
- Tdengine failed to start, start request repeated too quickly for taosd.service
- Failed to read auto-increment value from the storage engine in MySQL
- Error: (serious: a child container failed during start) (server component failed to start so Tomcat is unable)
- OSD deployment failure code (0x00000001) 0x80004005
- Ubuntu failed to start sshd with an error: failed to start OpenBSD secure shell server
- [Sovled] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
- Warning: failed to get default registry endpoint from daemon
- MySQL error STORAGE ENGINE failed
- Docker start error: failed to start docker application container engine.
- gvim: GConf-WARNING **: Client failed to connect to the D-BUS daemon
- MySQL failed to start prompt: job failed to start
- Ubuntu: Failed to initialize compiler: object java.lang.Object In compiler mirror not found
- Failed to talk to init daemon appears during reboot
- Zookeeper Failed to Start Error: start failed [How to Solve]
- Net start mongodb failed to start: system error 5 has occurred. Access is denied
- Error response from daemon: failed to parse mydockerfile-centos: ENV must have two arguments