[Solved] Timeout failure when upgrading horizon DAAS from 9.0.1/20.2.0 to 9.1.0/21.1.0

Problem Description
When we upgrade from VMware Horizon DaaS 9.0.1/20.2.0 to Horizon Daas 9.1.0/21.1.0, an error is reported on the HAL side and the upgrade fails with an error similar to the following in the HAL’s /var/log/horizon-air-link.log log.
2021-01-15T06:29:39,987 INFO [jersey-server-managed-async-executor-15] ApplianceNetworkAdapterAction – Updating network for appliance example.local, adapter: NetworkAdapterSpec{deviceIndex=1, connected=true, connectedOnStartup=true, network=‘null’}
2021-01-15T06:29:41,102 INFO [jersey-server-managed-async-executor-15] SpHalApiKeySyncAction – [10.11.3.224/example.local] running Synckey on the appliance.
2021-01-15T06:29:41,453 INFO [jersey-server-managed-async-executor-15] SpHalApiKeySyncAction – [10.11.3.224/example.local] Running script ‘sync-api-key’
2021-01-15T06:32:50,995 ERROR [jersey-server-managed-async-executor-15] AbstractAction – Error in abstract Action
com.vmware.vim25.InvalidState: null
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?]
at java.lang.Class.newInstance(Unknown Source) ~[?:?]
at com.vmware.vim25.ws.XmlGen.fromXml(XmlGen.java:205) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.vim25.ws.XmlGen.parseSoapFault(XmlGen.java:82) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:134) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.vim25.ws.VimStub.listProcessesInGuest(VimStub.java:3820) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.vim25.mo.GuestProcessManager.listProcessesInGuest(GuestProcessManager.java:68) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.bootstrap.vim.VijavaGuestClient.executeInGuest(VijavaGuestClient.java:118) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.hdaas.flow.SpHalApiKeySyncAction.doAction(SpHalApiKeySyncAction.java:138) ~[smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.AbstractAction.run(AbstractAction.java:94) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.CompositeAction.doAction(CompositeAction.java:53) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.AbstractAction.run(AbstractAction.java:94) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.FallbackAction.doAction(FallbackAction.java:25) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.AbstractAction.run(AbstractAction.java:94) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.CompositeAction.doAction(CompositeAction.java:53) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at com.vmware.smartnode.flow.AbstractAction.run(AbstractAction.java:94) [smartnode-bootstrap-21.1.0-all.jar:21.1.0-17336825]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-01-15T06:32:50,999 ERROR [jersey-server-managed-async-executor-15] AbstractAction – Error in abstract Action
com.vmware.vim25.InvalidState: null
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
In addition, you will see a similar message in the setup log of the SP1 device that has been created.
2021-01-15,06:29:43 [HAL restore – DB changes]: ******* Sync API key *******
2021-01-15,06:29:43 [HAL restore – DB changes]: Update API key into cloud_config
UPDATE 1
2021-01-15,06:29:43 [HAL restore – DB changes]: Update HAL ip address
INSERT 0 0
2021-01-15,06:29:43 [restart dtService]: Restarting dtService…
[sudo] password for desktone: 2021-01-15,06:32:53 [Appliance Services]: ******* Started handling appliance services for ‘stop’ action *******
2021-01-15,06:32:53 [Appliance Services]: Proceeding with ‘stop’ for services ‘dtService memcached dbmonitor slons’ on SP appliance
2021-01-15,06:32:53 [Appliance Services]: Service Name:dtService, Action:stop
Job for dtService.service canceled.
2021-01-15,06:32:53 [Appliance Services]: Service Name:memcached, Action:stop
Cause.
The issue is caused by a timeout due to a high load on vCenter Server during deployment causing the SP1 appliance to take longer to reboot. This is a known issue with 9.0.1/20.2.0 upgrades to 9.1.0/21.1.0 for which there is no official VMware solution.
Temporary treatment.
Please check to make sure that the vCenter instance and vCenter cluster cluster load at the location where the SP is part of is not under tension before trying to schedule the deployment task again.


Read More: