Error message:
trainer.model.save(self.dir, epoch, is_best=is_best)
AttributeError: 'DataParallel' object has no attribute 'save'
Source code analysis:
trainer.model.save(self.dir, epoch, is_best=is_best)
The above code is the code before using single machine multi card parallel. My parallel code is implemented as follows:
os.environ["CUDA_VISIBLE_DEVICES"] = "3,2,1"
model = torch.nn.DataParallel(model,device_ids=[0,1]).cuda()
Cause analysis: attributeerror: ‘dataparallel’ object has no attribute ‘save‘
Under torch multi GPU training, the whole model is stored instead of the model state_Dict(), so we need to use model when calling model Module mode. After using the above modification method, the code is as follows:
trainer.model.module.save(self.dir, epoch, is_best=is_best)