Category Archives: Error

[Solved] MindSpore Error: Select GPU kernel op * fail! Incompatible data type

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.5.2
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script normalizes the Tensor by constructing a BatchNorm single-operator network. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.batch_norm = ops.BatchNorm()
 05     def construct(self,input_x, scale, bias, mean, variance):
 06         output = self.batch_norm(input_x, scale, bias, mean, variance)
 07         return output
 08
 09 net = Net()
 10 input_x = Tensor(np.ones([2, 2]), mindspore.float16)
 11 scale = Tensor(np.ones([2]), mindspore.float16)
 12 bias = Tensor(np.ones([2]), mindspore.float16)
 13 bias = Tensor(np.ones([2]), mindspore.float16)
 14 mean = Tensor(np.ones([2]), mindspore.float16)
 15 variance = Tensor(np.ones([2]), mindspore.float16)
 16 output = net(input_x, scale, bias, mean, variance)
 17 print(output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "116945.py", line 22, in <module>
    output = net(input_x, scale, bias, mean, variance)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/nn/cell.py", line 407, in __call__
    out = self.compile_and_run(*inputs)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/nn/cell.py", line 734, in compile_and_run
    self.compile(*inputs)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/nn/cell.py", line 721, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/common/api.py", line 551, in compile
    result = self._graph_executor.compile(obj, args_list, phase, use_vm, self.queue_name)
TypeError: mindspore/ccsrc/runtime/device/gpu/kernel_info_setter.cc:355 PrintUnsupportedTypeException] Select GPU kernel op[BatchNorm] fail! Incompatible data type!
The supported data types are in[float32 float32 float32 float32 float32], out[float32 float32 float32 float32 float32]; in[float16 float32 float32 float32 float32], out[float16 float32 float32 float32 float32]; , but get in [float16 float16 float16 float16 float16 ] out [float16 float16 float16 float16 float16 ]

Cause Analysis

Let’s look at the error message. In TypeError, write Select GPU kernel op[BatchNorm] fail! Incompatible data type!

The supported data types are in[float32 float32 float32 float32 float32], out[float32 float32 float32 float32 float32]; in[float16 float32 float32 float32 float32], out[float16 float32 float32 float32 float32]; , but get in [float16 float16 float16 float16 float16 ] out [float16 float16 float16 float16 float16 ], which probably means that the current input data type combination is not supported in the GPU environment, and explains what the supported data type combinations are: all are float32 or input_x is float16, and the rest are float32 . Check the input of the script and find that all are of type float16, so an error is reported.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.batch_norm = ops.BatchNorm()
 05     def construct(self,input_x, scale, bias, mean, variance):
 06         output = self.batch_norm(input_x, scale, bias, mean, variance)
 07         return output
 08 
 09 net = Net()
 10 input_x = Tensor(np.ones([2, 2]), mindspore.float16)
 11 scale = Tensor(np.ones([2]), mindspore.float32)
 12 bias = Tensor(np.ones([2]), mindspore.float32)
 13 mean = Tensor(np.ones([2]), mindspore.float32)
 14 variance = Tensor(np.ones([2]), mindspore.float32)
 15 
 16 output = net(input_x, scale, bias, mean, variance)
 17 print(output)

At this point, the execution is successful, and the output is as follows:

output: (Tensor(shape=[2, 2], dtype=Float16, value=
[[ 1.0000e+00,  1.0000e+00],
 [ 1.0000e+00,  1.0000e+00]]), Tensor(shape=[2], dtype=Float32, value= [ 0.00000000e+00,  0.00000000e+00]), Tensor(shape=[2], dtype=Float32, value= [ 0.00000000e+00,  0.00000000e+00]), Tensor(shape=[2], dtype=Float32, value= [ 0.00000000e+00,  0.00000000e+00]), Tensor(shape=[2], dtype=Float32, value= [ 0.00000000e+00,  0.00000000e+00]))

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: 16 output = net(input_x, scale, bias, mean, variance);

2. According to the keywords in the log error message, narrow the scope of the analysis problem: The supported data types are in[float32 float32 float32 float32 float32], out[float32 float32 float32 float32 float32]; in[float16 float32 float32 float32 float32], out [float16 float32 float32 float32 float32]; , but get in [float16 float16 float16 float16 float16 ] out [float16 float16 float16 float16 float16 ]

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: ReduceMean in the Ascend environment does not support inputs of 8 or more dimensions

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is to average and reduce axis1 by constructing the ReduceMean operator network. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self, axis, keep_dims):
 03         super().__init__()
 04         self.reducemean = ops.ReduceMean(keep_dims=keep_dims)
 05         self.axis = axis
 06     def construct(self, input_x):
 07         return self.reducemean(input_x, self.axis)
 08 net = Net(axis=(1,), keep_dims=True)
 09 x = Tensor(np.random.randn(1, 2, 3, 4, 5, 6, 7, 8, 9), mindspore.float32)
 10 out = net(x)
 11 print("out shape: ", out.shape)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    out = net(x)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 574, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 975, in compile_and_run
    self.compile(*inputs)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 948, in compile
    jit_config_dict=self._jit_config_dict)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/common/api.py", line 1092, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: Single op compile failed, op: reduce_mean_d_1629966128061146056_6
 except_msg: 2022-07-15 01:36:29.720449: Query except_msg:Traceback (most recent call last):
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/parallel_compilation.py", line 1469, in run
    relation_param=self._relation_param)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/fusion_manager.py", line 1283, in build_single_op
    compile_info = call_op()
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/fusion_manager.py", line 1270, in call_op
    opfunc(*inputs, *outputs, *new_attrs, **kwargs)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 537, in _in_wrapper
    formal_parameter_list[i][1], op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 516, in _check_one_op_param
    _check_input(op_param, param_name, param_type, op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 299, in _check_input
    _check_input_output_dict(op_param, param_name, op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 223, in _check_input_output_dict
    param_name=param_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 689, in check_shape
    _check_shape_range(max_rank, min_rank, param_name, shape)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 727, in _check_shape_range
    % (error_info['param_name'], min_rank, max_rank, len(shape)))
RuntimeError: ({'errCode': 'E80012', 'op_name': 'reduce_mean_d', 'param_name': 'input_x', 'min_value': 0, 'max_value': 8, 'real_value': 9}, 'In op, the num of dimensions of input/output[input_x] should be inthe range of [0, 8], but actually is [9].')

Cause Analysis

Let’s look at the error message. In RuntimeError, ‘In op, the num of dimensions of input/output[input_x] should be in the range of [0, 8], but actually is [9].’ means that the input of ReduceMean is a dimension It should be greater than or equal to 0 and less than or equal to 8, but the actual value is 9, which obviously exceeds the dimension supported by the ReduceMean operator. In the official website, ReduceSum also made a description of the input dimension limitation:
image.png

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self, axis, keep_dims):
 03         super().__init__()
 04         self.reducemean = ops.ReduceMean(keep_dims=keep_dims)
 05         self.axis = axis
 06     def construct(self, input_x):
 07         return self.reducemean(input_x, self.axis)
 08 net = Net(axis=(1,), keep_dims=True)
 09 x = Tensor(np.random.randn(2, 3, 4, 5, 6, 7, 8, 9), mindspore.float32)
 10 out = net(x)
 11 print("out shape: ", out.shape)

At this point, the execution is successful, and the output is as follows:

out shape: (2, 1, 4, 5, 6, 7, 8, 9)

3 Summary

Steps to locate the error report:
1. Find the user code line that reports the error: out = net(x);
2. According to the keywords in the log error message, narrow the scope of the analysis problem: should be in the range of [0, 8] , but actually is [10];
3. Focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: “RuntimeError: Unexpected error. Inconsistent batch..

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case customizes the dataset and performs batch operations.
picture.png

1.2.2 Error reporting

RuntimeError: Unexpected error. Inconsistent batch shapes, batch operation expect same shape for each data row, but got inconsistent shape in column 0, expected shape for this column is:, got shape:
picture.png

2 Reason analysis

According to the error message, the batch operation needs to input the same shape of the dataset, but the shape of the custom dataset is not uniform, resulting in an error.

3 Solutions

1. Remove the batch operation.
picture.png

2. If you must perform batch operations on data with inconsistent shapes, you need to organize the data set and unify the shape of the input data through pad completion and other methods.

[Solved] MindSpore Error: “GeneratorDataset’s num_workers=8, this value is …”

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.2.0
– Python version (eg, Python 3.7.5): 3.7.5
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case runs the linear function fitting example on the official website, and mindspore has been successfully installed before.
picture.png

1.2.2 Error reporting

Error message: RuntimeError: Thread ID 140706176251712 Unexpected error. GeneratorDataset’s num_workers=8, this value is not within the required range of [1, cpu_thread_cnt=2].
Line of code : 639
File : /home/jenkins/agent-working-dir /workspace/Compile_CPU_X86_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/engine/ir/datasetops/dataset_node.cc

2 Reason analysis

The number of CPU cores when the user is running is less than the number of cores used by default when the dataset module generates data. Mindspore does not perform adaptive configuration for the number of CPU cores in the hardware in 1.2.0. It is required when the configuration of the PC is not high. Manually configure the number of CPU cores.

3 Solutions

1. Add code to manually configure the number of CPU cores:
ds.config.set_num_parallel_workers(2)
2. Use a higher version of mindspore, the current mindspore-1.6.0 will be adaptively configured according to the number of CPU cores in the hardware to avoid the occurrence of CPU cores If the number is too low, an error will be reported.

4 Summary

1. You can locate the problem according to the prompt of the error message. In this case, it is a problem of the number of CPU cores. You can search for the method of setting the number of CPU cores in the official website tutorial and the open source MindSpore documentation.
2. At present, MindSpore provides an automatic data tuning tool – Dataset AutoTune, which is used to automatically adjust the parallelism of the data processing pipeline according to the environment resources during the training process. During this process, the CPU cores in the hardware will be automatically detected. The number of adaptive configuration.
3. The config module in MindSpore can set or obtain the global configuration parameters of data processing.

[Solved] MindSpore Error: “TypeError: parse() missing 1 required positional.”

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case uses the mindspore.dataset custom dataset:

import os
import numpy as np
from PIL import Image
import mindspore.common.dtype as mstype
import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as C
import mindspore.dataset.vision.c_transforms as vc

class _dcp_Dataset:
    def __init__(self,img_root_dir,device_target="CPU"):
        if not os.path.exists(img_root_dir):
            raise RuntimeError(f"the input image dir {img_root_dir} is invalid")
        self.img_root_dir=img_root_dir
        self.img_names=[i for i in os.listdir(img_root_dir) if i.endswith(".jpg")]
        self.target=device_target

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, index):
        img_name=self.img_names[index]
        im=Image.open(os.path.join(self.img_root_dir,img_name))
        image=np.array(im)
        label_str=img_name.split("_")[-1]
        label_str=label_str.split(".")[0]
        label=np.array(label_str)
        return image,label

def creat_dataset(dataset_path,batch_size=2,num_shards=1,shard_id=0,device_target="CPU"):
    dataset=_dcp_Dataset(dataset_path,device_target)
    data_set=ds.GeneratorDataset(dataset,["image","label"],shuffle=True,num_shards=1,shard_id=0)
    image_trans=[
        vc.Resize((224,224)),
        vc.RandomHorizontalFlip(),
        vc.Rescale(1/255,shift=0),
        vc.Normalize((0.4465, 0.4822, 0.4914), (0.2010, 0.1994, 0.2023)),
        vc.HWC2CHW
    ]
    label_trans=[C.TypeCast(mstype.int32)]

    data_set=data_set.map(operations=image_trans,input_columns=["image"])
    data_set=data_set.map(operations=label_trans,input_columns=["label"])
    # data_set=data_set.shuffle(buffer_size=batch_size)
    data_set=data_set.batch(batch_size=batch_size,drop_remainder=True)
    # data_set=data_set.repeat(1)

    return data_set


if __name__ == '__main__':
   data=creat_dataset("./image_DCP")
   print(data)
   data_loader = data.create_dict_iterator()
   for i, data in enumerate(data_loader): 
        print(i)
        print(data)

1.2.2 Error reporting

Error message:
picture.png

2 Reason analysis and solution

picture.png
There is a lack of () here, change the code here to vc.HWC2CHW() to execute normally.

3 Summary

Steps to locate the problem

For example: there is a data processing flow such as xxDataset -> map -> map -> batch.
Scripts can be debugged as follows:

  1. Only keep xxDataset, and then run the script to see if an error is reported;
  2. Keep xxDataset -> map, and then run the script to see if an error is reported;
  3. Keep xxDataset -> map -> map, and then run the script to see if an error is reported;
  4. Keep xxDataset -> map -> map -> batch, and then run the script to see if an error is reported;

According to the above method, you can locate which map/batch is wrong.

[Solved] MindSpore Error: ValueError: Minimum inputs size 0 does not match…

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.5.2
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script calculates the minimum value of the corresponding elements of the input Tensor by constructing a single-operator network of Minimum. The script is as follows:

 01 class Net(nn.Cell):
 02   def __init__(self):
 03     super(Net, self).__init__()
 04     self.minimum = ops.minumum()
 05   def construct(self, x,y):
 06     output = self.minimum(x, y)
 07     return output
 08 net = Net()
 09 x = Tensor(np.array([1.0, 2.0, 4.0]), mindspore.float64) .astype(mindspore.float32)
 10	y = Tensor(np.array([3.0, 5.0, 6.0]), mindspore.float64) .astype(mindspore.float32)
 11 output = net(x, y)
 12 print('output', output)

1.2.2 Error reporting

The error message here is as follows:

[EXCEPTION] PYNATIVE(95109,7fe22ea22740,python):2022-06-14-21:59:02.837.339 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1331] SetImplicitCast] Minimum inputs size 0 does not match the requires signature size 2
Traceback (most recent call last):
  File "182168.py", line 36, in <module>
    net = Net()
  File "182168.py", line 24, in __init__
    self.minimum = ops.minimum()
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/ops/primitive.py", line 247, in __call__
    return _run_op(self, self.name, args)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/common/api.py", line 78, in wrapper
    results = fn(*arg, **kwargs)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/ops/primitive.py", line 682, in _run_op
    output = real_run_op(obj, op_name, args)
ValueError: mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1331 SetImplicitCast] Minimum inputs size 0 does not match the requires signature size 2

Cause Analysis

Let’s look at the error message. In ValueError, it is written that Minimum inputs size 0 does not match the requires signature size 2, which probably means that two inputs are required, but there are actually no inputs. Combined with line 11, it is found that two inputs are actually passed in. So the problem may not lie in this line. We can add a print mark after lines 3, 5, and 11 to see where the program is executed. After execution, it is found that only 1 is printed, indicating that the problem lies in the fourth line. After careful inspection, it is found that the minimum is in lowercase when the minimum operator is initialized in the fourth line, which is equivalent to directly calling the functional interface minimum operator, so the error is not passed. input parameters.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02   def __init__(self):
 03     super(Net, self).__init__()
 04     self.minimum = ops.Minimum()
 05   def construct(self, x,y):
 06     output = self.minimum(x, y)
 07     return output
 08 net = Net()
 09 x = Tensor(np.array([1.0, 2.0, 4.0]), mindspore.float64) .astype(mindspore.float32)
 10	y = Tensor(np.array([3.0, 5.0, 6.0]), mindspore.float64) .astype(mindspore.float32)
 11 output = net(x, y)
 12 print('output', output)

At this point, the execution is successful, and the output is as follows:

output [1. 2. 3.]

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.minimum = ops.minimum() ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem. Minimum inputs size 0 does not match the requires signature size 2  ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: “RuntimeError: Invalid data, Page size.”

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This example converts the custom dataset into the MindSpore Record data format.
picture.png

1.2.2 Error reporting

RuntimeError: Syntax error. Invalid data, Page size: 1048576 is too small to save a blob row. Please try to use the mindrecord api ‘set_page_size’ to enable 64MB page size.
Line of code : 1153
picture.png

2 Reason analysis

According to the error message, the Pagesize setting is too small to read that much data.

3 Solutions

Refer to the provided set_page_size API to set the pagesize to a larger size.
picture.png

4 Summary

1. Set the size of the page that represents the area where data is stored. These areas are divided into two types: row page and blob page. The larger the page, the more data can be stored.
2. When the pagesize is not set, the default sample size that can be stored is 32MB. If the sample size is larger than the default size, the user needs to call the API to set the appropriate size.
3. The adjustable range of pagesize is 32 1024 (32KB) to 256 1024*1024 (256MB).

[Solved] MindSpore Error: `seed2` in `StandardNormal` should be int and >=0

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script generates random numbers conforming to the normal distribution by constructing a single-operator network of StandardNormal. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self, seed=2, seed2=-3):
 03         super(Net, self).__init__()
 04         self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
 05     def construct(self, output_shape):
 06         output = self.standard_normal(output_shape)
 07         return output
 08 
 09 output_shape = (2, 3, 4)
 10 net = Net()
 11 output = net(output_shape)
 12 print("OUTPUT: ", output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I4DSWV-standardNormal.py", line 16, in <module>
    net = Net()
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I4DSWV-standardNormal.py", line 10, in __init__
    self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\ops\primitive.py", line 687, in deco
    fn(self, *args, **kwargs)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\ops\operations\random_ops.py", line 67, in __init__
    Validator.check_non_negative_int(seed2, "seed2", self.name)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 304, in check_non_negative_int
    return check_number(arg_value, 0, Rel.GE, int, arg_name, prim_name)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 168, in check_number
    raise type_except(f'{prim_info} should be {arg_type.__name__} and must {rel_str}, '
ValueError: `seed2` in `StandardNormal` should be int and must >= 0, but got `-3` with type `int`.

Cause Analysis

Let’s look at the error message. In ValueError, it is written that seed2 in  StandardNormal should be int and must >= 0, but got  -3 with type  int, which means that the seed2 attribute in the StandardNormal operator must be an integer greater than or equal to 0, but a negative number is obtained. The official website explains the range of the two parameters seed and seed2:

image.png

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self, seed=2, seed2=3):
 03         super(Net, self).__init__()
 04         self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
 05     def construct(self, output_shape):
 06         output = self.standard_normal(output_shape)
 07         return output
 08 
 09 output_shape = (2, 3, 4)
 10 net = Net()
 11 output = net(output_shape)
 12 print("OUTPUT: ", output)

At this point, the execution is successful, and the output is as follows:

OUTPUT: [[[-0.7503836 -1.4105444 1.689283 1.0585287 ]
[ 0.19296396 -1.1577806 -0.9638309 -0.01727804]
[ 0.3017666 -1.1502111 -0.3734214 -0.4361166 ]]

[[ 0.7154948 0.6556154 2.3681476 1.1285974 ]
[-0.42168498 -0.2680572 -0.9242947 1.0003353 ]
[-0.4574307 0.36757398 -0.28976655 0.4996464 ]]]

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2) ;

2. According to the keywords in the log error message, narrow the scope of the analysis problem t  seed2 in  StandardNormal should be int and must >= 0, but got  ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: For ‘CellList’, each cell should be subclass of Cell

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script implements the cell list container by constructing a single-operator network of CellList. The script is as follows:

 01 class ListNoneExample(nn.Cell):
 02     def __init__(self):
 03         super(ListNoneExample, self).__init__()
 04         self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()])
 05 
 06     def construct(self, x):
 07         output = []
 08         for op in self.lst:
 09             output.append(op(x))
 10         return output
 11 
 12 input = Tensor(np.random.normal(0, 2, (2, 1)).astype(np.float32))
 13 example = ListNoneExample()
 14 output = example(input)
 15 print("Output:", output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3OGVW.py", line 31, in <module>
    example = ListNoneExample()
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3OGVW.py", line 19, in __init__
    self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()])
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 310, in __init__
    self.extend(args[0])
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 405, in extend
    if _valid_cell(cell, cls_name):
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 39, in _valid_cell
    raise TypeError(f'{msg_prefix} each cell should be subclass of Cell, but got {type(cell).__name__}.')
TypeError: For 'CellList', each cell should be subclass of Cell, but got NoneType.

Cause Analysis

Let’s look at the error message. In TypeError, write For ‘CellList’, each cell should be subclass of Cell, but got NoneType.
, which means that for the CellList operator, each incoming cell should be nn.Cell A subclass of , but gets the None type. Check line 4 of the behavior of initializing CellList in the network, and find that a None is passed in, so an error is reported. In order to solve this problem, just replace the None here with an object that inherits from the base class Cell class, and the same function can be achieved.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class NoneCell(nn.Cell):
 02     def __init__(self):
 03         super(NoneCell, self).__init__()
 04 
 05     def construct(self, x):
 06         return x
 07 
 08 class ListNoneExample(nn.Cell):
 09     def __init__(self):
 10         super(ListNoneExample, self).__init__()
 11         self.lst = nn.CellList([nn.ReLU(), NoneCell(), nn.ReLU()])
 12 
 13     def construct(self, x):
 14         output = []
 15         for op in self.lst:
 16             output.append(op(x))
 17         return output
 18 
 19 input = Tensor(np.random.normal(0, 2, (2, 1)).astype(np.float32))
 20 example = ListNoneExample()
 21 output = example(input)
 22 print("Output:", output)

At this point, the execution is successful, and the output is as follows:

Output: (Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [0.00000000e+000]]), Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [-2.74355006e+000]]), Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [0.00000000e+000]])) 

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()]) ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem each cell should be subclass of Cell, but got NoneType  ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: For ‘MirrorPad‘, paddings must be a Tensor with *

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is done by building a single-operator network of MirrorPad that populates tensors with mirrored values. The script is as follows:

01  context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
02  class Net(nn.Cell):
03      def __init__(self):
04          super(Net, self).__init__()
05          self.pad = ops.MirrorPad(mode="REFLECT")
06      def construct(self, x, paddings):
07          return self.pad(x, paddings)
08  
09  x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
10  paddings = Tensor([[1, 1], [2, 2]])
11  pad = Net()
12  output = pad(x, paddings)
13  print(output.shape)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "99553.py", line 20, in <module>
    output = pad(x, paddings)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 956, in compile_and_run
    self.compile(*inputs)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 929, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/common/api.py", line 1063, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/ops/operations/nn_ops.py", line 4189, in __infer__
    raise ValueError(f"For '{self.name}', paddings must be a Tensor with type of int64, "
ValueError: For 'MirrorPad', paddings must be a Tensor with type of int64, but got None.

Cause Analysis

Let’s look at the error message. In ValueError, we write ValueError: For ‘MirrorPad’, paddings must be a Tensor with type of int64, but got None., which means that paddings must be a Tensor[INT64], but the actual result is None. , Combined with the 10th and 12th lines of the script, it is found that the paddings initialization value is obviously not None, which is why. In fact, this is because, in mindspore’s graph mode, the constant input must be passed in at the stage of composition, otherwise, it can only get None when it is executed. Ps. This problem does not exist in PYNATIVE_MODE.

2 Solutions

For the reasons known above, two modifications can be made to solve this problem.

# First, run the script under PYNATIVE_MODE.
from mindspore import context

context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU")
class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.pad = ops.MirrorPad(mode="REFLECT")
    def construct(self, x, paddings):
        return self.pad(x, paddings)

x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
paddings = Tensor([[1, 1], [2, 2]])
pad = Net()
output = pad(x, paddings)
print(output shape: output.shape)

At this point, the execution is successful, and the output is as follows:

output shape: (4, 7)

# In the second, the paddings are passed in as constants during composition.
class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.pad = ops.MirrorPad(mode="REFLECT")
        self.paddings = Tensor([[1, 1], [2, 2]])

    def construct(self, x):
        return self.pad(x, self.paddings)

x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
pad = Net()
output = pad(x)
print("output shape: "output.shape)

At this point, the execution is successful, and the output is as follows:

output shape: (4, 7)

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: output = pad(x, paddings);

2. ValueError: For ‘MirrorPad’, paddings must be a Tensor with type of int64, but got None;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: Data type conversion of ‘Parameter’ is not supporte

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is to update the variable Tensor according to the AddSign algorithm by constructing a single operator network of ApplyPowerSign. The script is as follows:

 01 class Net(nn.Cell):
 02 	def __init__(self):
 03    		super(Net, self).__init__()
 04    		self.apply_power_sign = ops.ApplyPowerSign()
 05   		self.var = Parameter(Tensor(np.array([[0.6, 0.4],
 06                                         [0.1, 0.5]]).astype(np.float16)), name="var")
 07   		self.m = Parameter(Tensor(np.array([[0.6, 0.5],
 08                                       [0.2, 0.6]]).astype(np.float32)), name="m")
 09   		self.lr = 0.001
 10   		self.logbase = np.e
 11   		self.sign_decay = 0.99
 12   		self.beta = 0.9
 13 	def construct(self, grad):
 14   		out = self.apply_power_sign(self.var, self.m, self.lr, self.logbase,
 15                               self.sign_decay, self.beta, grad)
 16   		return out
 17 
 18 net = Net()
 19 grad = Tensor(np.array([[0.3, 0.7], [0.1, 0.8]]).astype(np.float32))
 20 output = net(grad)
 21 print(output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "ApplyPowerSign.py", line 27, in <module>
    output = net(grad)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 956, in compile_and_run
    self.compile(*inputs)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 929, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1063, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: Data type conversion of 'Parameter' is not supported, so data type float16 cannot be converted to data type float32 automatically.
For more details, please refer at https://www.mindspore.cn/docs/zh-CN/master/note/operator_list_implicit.html.

Cause Analysis

Let’s look at the error message. In AttributeError, we write RuntimeError: Data type conversion of ‘Parameter’ is not supported, so data type float16 cannot be converted to data type float32 automatically, which means that the Parameter data type cannot be converted. This point about the use of Parameter is explained on the official website API:
image.png

Therefore, if Parameter is input as part of the network, its data type needs to be determined in the composition stage, and it is not supported to convert its data type in the middle.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02 	def __init__(self):
 03    		super(Net, self).__init__()
 04    		self.apply_power_sign = ops.ApplyPowerSign()
 05   		self.var = Parameter(Tensor(np.array([[0.6, 0.4],
 06                                         [0.1, 0.5]]).astype(np.float32)), name="var")
 07   		self.m = Parameter(Tensor(np.array([[0.6, 0.5],
 08                                       [0.2, 0.6]]).astype(np.float32)), name="m")
 09   		self.lr = 0.001
 10   		self.logbase = np.e
 11   		self.sign_decay = 0.99
 12   		self.beta = 0.9
 13 	def construct(self, grad):
 14   		out = self.apply_power_sign(self.var, self.m, self.lr, self.logbase,
 15                               self.sign_decay, self.beta, grad)
 16   		return out
 17 
 18 net = Net()
 19 grad = Tensor(np.array([[0.3, 0.7], [0.1, 0.8]]).astype(np.float32))
 20 output = net(grad)
 21 print(output)

At this point, the execution is successful, and the output is as follows:

output: (Tensor(shape=[2, 2], dtype=Float32, value=
[[ 5.95575690e-01,  3.89676481e-01],
 [ 9.85252112e-02,  4.88201708e-01]]), Tensor(shape=[2, 2], dtype=Float32, 
 value=[[ 5.70000052e-01,  5.19999981e-01],
 [ 1.89999998e-01,  6.20000064e-01]]))

3 Summary

Steps to locate the error report:

1. Find the line of user code that reported the error: 20 output = net(grad) ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem. Data type conversion of ‘Parameter’ is not supported, so data type float16 cannot be converted to data type float32 automatically.  ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: Should not use Python in runtime

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.7.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case is to use the MindSpore JIT Fallback function to call the logic of using Numpy in mindspore

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor, ms_function, context

context.set_context(mode=context.GRAPH_MODE)

def test_validate():
  @ms_function
  def Func():
    x = np.array([1])
    if x >= 1:
      x = x * 2
    return x
  res = Func()
  print("res:", res)

1.2.2 Error reporting

RuntimeError: mindspore/ccsrc/pipeline/jit/validator.cc:141 ValidateValueNode] Should not use Python object in runtime, node: ValueNode<InterpretedObject> InterpretedObject: '[2]'.
Line: In file /home/llg/workspace/mindspore/mindspore/test.py(15)
E               if x >= 1:

2 Reason analysis and solution

This is because the MindSpore compiler found that there were still some nodes of interpretation type inside the function when it checked after the compilation, resulting in an error. After viewing the code, it is found that the function returns a numpy type of data, which has not been converted into MindSpore’s Tensor, so that it cannot enter the back-end runtime for calculation, resulting in an error. You can wrap the numpy array into a Tensor and calculate it before returning it. Use Tensor.asnumpy() outside the function to convert it to numpy array type data, and perform other numpy related operations.

3 Summary

MindSpore’s functions operate on numpy types through JIT Fallback, which can only be deduced and executed at compile time and cannot be passed to runtime. And it cannot be returned as the final function return value, otherwise it will cause an error when passed to the runtime. You can wrap a numpy array into a Tensor and return it. Use Tensor.asnumpy() outside the function to convert it to numpy array type data, and perform other numpy related operations.