Tag Archives: #Problems encountered in writing Python

[Solved] PyTorch Error: TypeError: exceptions must derive from BaseException

Project scenario:

PyTorch reports an error: TypeError: exceptions must deliver from BaseException


Problem description

In base_options.py, set the –netG parameters to be selected only from these.

self.parser.add_argument('--netG', type=str, default='p2hed', choices=['p2hed', 'refineD', 'p2hed_att'], help='selects model to use for netG')

However, when selecting netG, the code is written as follows:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, 
             n_blocks_local=3, norm='instance', gpu_ids=[]):    
    norm_layer = get_norm_layer(norm_type=norm)     
    if netG == 'p2hed':    
        netG = DDNet_p2hED(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)
    elif netG == 'refineDepth':
        netG = DDNet_RefineDepth(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    elif netG == 'p2h_noatt':        
        netG = DDNet_p2hed_noatt(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    else:
        raise('generator not implemented!')
    #print(netG)
    if len(gpu_ids) > 0:
        assert(torch.cuda.is_available())   
        netG.cuda(gpu_ids[0])
    netG.apply(weights_init)
    return netG

Cause analysis:

Note that there is no option of ‘rfineD’, so when running the code, the program cannot find the network that netG should select, so it reports an error.


Solution:

In fact, change the “elif netG==’refineDepth’:”  to “elif netG==’refineD’:”. it will be OK!

[Solved] RuntimeError: NCCL error in: XXX, unhandled system error, NCCL version 2.7.8

Project scenario:

This problem is encountered in distributed training,


Problem description

Perhaps parallel operation is not started???(


Solution:

(1) First, check the server GPU related information. Enter the pytorch terminal to enter the code

python
torch.cuda.is_available()# to see if cuda is available.
torch.cuda.device_count()# to see the number of gpu's.
torch.cuda.get_device_name(0)# to see the gpu name, the device index starts from 0 by default.
torch.cuda.current_device()# return the current device index.

Ctrl+Z Exit
(2) cd enters the upper folder of the file to be run

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6 #启动并行运算

Plus files to run and related configurations

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6  src_nq/create_examples.py --vocab_file ./bert-base-uncased-vocab.txt \--input_pattern "./natural_questions/v1.0/train/nq-train-*.jsonl.gz" \--output_dir ./natural_questions/nq_0.03/\--do_lower_case \--num_threads 24 --include_unknowns 0.03 --max_seq_length 512 --doc_stride 128

Problem-solving!