Doris decommission be node stuck [How to Solve]

Question

In Doris 0.13.15 precompiled version

Doris decommission three be nodes stuck

alter system decommission backend "be_host-1:9050"
alter system decommission backend "be_host-2:9050"
alter system decommission backend "be_host-3:9050"

Stuck

analysis

To view the source code, you need to adjust the catalog_ trash_ expire_ Second parameter
for other parameters, you can view the Chinese description of Doris Fe configuration
the significance of this parameter is that it provides a protection mechanism. After deleting the database (Table/partition), you can use this catalog_ trash_ expire_ Use recover stmt to recover it within the second time
this parameter specifies the maximum data retention time. After a period of time, the data will be permanently deleted

this is for protection. I'm afraid someone deleted the data and regretted that it was too late to recover the data

because someone deleted the table, some table partitions were deleted, but the be commitment stuck in the recycle area

Source code analysis


Isexpire () in the erasepartition() method uses this parameter to determine whether a partition can be deleted, of course, it must be greater than a minimum deletion delay time mineraselatency (10min)

solve

Set the parameter
catalog_ trash_ expire_ Second (the default value is 86400, 1 day)
set a small point. After the partition data to be recycled expires, the decommission can be completed

Follow up questions

After the parameter is adjusted small, 2 of the three offline nodes have been successfully offline, but another be node is stuck in the state of 2 remaining tablets

terms of settlement

CANCEL DECOMMISSION BACKEND "be_host-1:9050";

Wait for show proc ‘/ statistical’; After the unhealthy tablet in is reduced to 0, you can go offline again. Of course, you can also try to execute the offline command without waiting for the unhealthy tablet to be reduced to 0

alter system decommission backend "be_host-1:9050"

Read More: