Tag Archives: MindSpore Error

[Solved] MindSpore Error: “RuntimeError: Unexpected error. Inconsistent batch..

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case customizes the dataset and performs batch operations.

1.2.2 Error reporting

RuntimeError: Unexpected error. Inconsistent batch shapes, batch operation expect same shape for each data row, but got inconsistent shape in column 0, expected shape for this column is:, got shape:

2 Reason analysis

According to the error message, the batch operation needs to input the same shape of the dataset, but the shape of the custom dataset is not uniform, resulting in an error.

3 Solutions

1. Remove the batch operation.

2. If you must perform batch operations on data with inconsistent shapes, you need to organize the data set and unify the shape of the input data through pad completion and other methods.

[Solved] MindSpore Error: “GeneratorDataset’s num_workers=8, this value is …”

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.2.0
– Python version (eg, Python 3.7.5): 3.7.5
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case runs the linear function fitting example on the official website, and mindspore has been successfully installed before.

1.2.2 Error reporting

Error message: RuntimeError: Thread ID 140706176251712 Unexpected error. GeneratorDataset’s num_workers=8, this value is not within the required range of [1, cpu_thread_cnt=2].
Line of code : 639
File : /home/jenkins/agent-working-dir /workspace/Compile_CPU_X86_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/engine/ir/datasetops/dataset_node.cc

2 Reason analysis

The number of CPU cores when the user is running is less than the number of cores used by default when the dataset module generates data. Mindspore does not perform adaptive configuration for the number of CPU cores in the hardware in 1.2.0. It is required when the configuration of the PC is not high. Manually configure the number of CPU cores.

3 Solutions

1. Add code to manually configure the number of CPU cores:
ds.config.set_num_parallel_workers(2)
2. Use a higher version of mindspore, the current mindspore-1.6.0 will be adaptively configured according to the number of CPU cores in the hardware to avoid the occurrence of CPU cores If the number is too low, an error will be reported.

4 Summary

1. You can locate the problem according to the prompt of the error message. In this case, it is a problem of the number of CPU cores. You can search for the method of setting the number of CPU cores in the official website tutorial and the open source MindSpore documentation.
2. At present, MindSpore provides an automatic data tuning tool – Dataset AutoTune, which is used to automatically adjust the parallelism of the data processing pipeline according to the environment resources during the training process. During this process, the CPU cores in the hardware will be automatically detected. The number of adaptive configuration.
3. The config module in MindSpore can set or obtain the global configuration parameters of data processing.

[Solved] MindSpore Error: “TypeError: parse() missing 1 required positional.”

Leave a reply

1 Error description

1.1 System Environment

1.2 Basic information

1.2.1 Script

This case uses the mindspore.dataset custom dataset:

import os
import numpy as np
from PIL import Image
import mindspore.common.dtype as mstype
import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as C
import mindspore.dataset.vision.c_transforms as vc

class _dcp_Dataset:
    def __init__(self,img_root_dir,device_target="CPU"):
        if not os.path.exists(img_root_dir):
            raise RuntimeError(f"the input image dir {img_root_dir} is invalid")
        self.img_root_dir=img_root_dir
        self.img_names=[i for i in os.listdir(img_root_dir) if i.endswith(".jpg")]
        self.target=device_target

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, index):
        img_name=self.img_names[index]
        im=Image.open(os.path.join(self.img_root_dir,img_name))
        image=np.array(im)
        label_str=img_name.split("_")[-1]
        label_str=label_str.split(".")[0]
        label=np.array(label_str)
        return image,label

def creat_dataset(dataset_path,batch_size=2,num_shards=1,shard_id=0,device_target="CPU"):
    dataset=_dcp_Dataset(dataset_path,device_target)
    data_set=ds.GeneratorDataset(dataset,["image","label"],shuffle=True,num_shards=1,shard_id=0)
    image_trans=[
        vc.Resize((224,224)),
        vc.RandomHorizontalFlip(),
        vc.Rescale(1/255,shift=0),
        vc.Normalize((0.4465, 0.4822, 0.4914), (0.2010, 0.1994, 0.2023)),
        vc.HWC2CHW
    ]
    label_trans=[C.TypeCast(mstype.int32)]

    data_set=data_set.map(operations=image_trans,input_columns=["image"])
    data_set=data_set.map(operations=label_trans,input_columns=["label"])
    # data_set=data_set.shuffle(buffer_size=batch_size)
    data_set=data_set.batch(batch_size=batch_size,drop_remainder=True)
    # data_set=data_set.repeat(1)

    return data_set


if __name__ == '__main__':
   data=creat_dataset("./image_DCP")
   print(data)
   data_loader = data.create_dict_iterator()
   for i, data in enumerate(data_loader): 
        print(i)
        print(data)

1.2.2 Error reporting

Error message:

2 Reason analysis and solution

There is a lack of () here, change the code here to vc.HWC2CHW() to execute normally.

3 Summary

Steps to locate the problem

For example: there is a data processing flow such as xxDataset -> map -> map -> batch.
Scripts can be debugged as follows:

Only keep xxDataset, and then run the script to see if an error is reported;
Keep xxDataset -> map, and then run the script to see if an error is reported;
Keep xxDataset -> map -> map, and then run the script to see if an error is reported;
Keep xxDataset -> map -> map -> batch, and then run the script to see if an error is reported;

According to the above method, you can locate which map/batch is wrong.

[Solved] MindSpore Error: ValueError: Minimum inputs size 0 does not match…

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.5.2
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script calculates the minimum value of the corresponding elements of the input Tensor by constructing a single-operator network of Minimum. The script is as follows:

 01 class Net(nn.Cell):
 02   def __init__(self):
 03     super(Net, self).__init__()
 04     self.minimum = ops.minumum()
 05   def construct(self, x,y):
 06     output = self.minimum(x, y)
 07     return output
 08 net = Net()
 09 x = Tensor(np.array([1.0, 2.0, 4.0]), mindspore.float64) .astype(mindspore.float32)
 10	y = Tensor(np.array([3.0, 5.0, 6.0]), mindspore.float64) .astype(mindspore.float32)
 11 output = net(x, y)
 12 print('output', output)

1.2.2 Error reporting

The error message here is as follows:

[EXCEPTION] PYNATIVE(95109,7fe22ea22740,python):2022-06-14-21:59:02.837.339 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1331] SetImplicitCast] Minimum inputs size 0 does not match the requires signature size 2
Traceback (most recent call last):
  File "182168.py", line 36, in <module>
    net = Net()
  File "182168.py", line 24, in __init__
    self.minimum = ops.minimum()
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/ops/primitive.py", line 247, in __call__
    return _run_op(self, self.name, args)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/common/api.py", line 78, in wrapper
    results = fn(*arg, **kwargs)
  File "/data2/llj/mindspores/r1.5/build/package/mindspore/ops/primitive.py", line 682, in _run_op
    output = real_run_op(obj, op_name, args)
ValueError: mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:1331 SetImplicitCast] Minimum inputs size 0 does not match the requires signature size 2

Cause Analysis

Let’s look at the error message. In ValueError, it is written that Minimum inputs size 0 does not match the requires signature size 2, which probably means that two inputs are required, but there are actually no inputs. Combined with line 11, it is found that two inputs are actually passed in. So the problem may not lie in this line. We can add a print mark after lines 3, 5, and 11 to see where the program is executed. After execution, it is found that only 1 is printed, indicating that the problem lies in the fourth line. After careful inspection, it is found that the minimum is in lowercase when the minimum operator is initialized in the fourth line, which is equivalent to directly calling the functional interface minimum operator, so the error is not passed. input parameters.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02   def __init__(self):
 03     super(Net, self).__init__()
 04     self.minimum = ops.Minimum()
 05   def construct(self, x,y):
 06     output = self.minimum(x, y)
 07     return output
 08 net = Net()
 09 x = Tensor(np.array([1.0, 2.0, 4.0]), mindspore.float64) .astype(mindspore.float32)
 10	y = Tensor(np.array([3.0, 5.0, 6.0]), mindspore.float64) .astype(mindspore.float32)
 11 output = net(x, y)
 12 print('output', output)

At this point, the execution is successful, and the output is as follows:

output [1. 2. 3.]

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.minimum = ops.minimum() ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem. Minimum inputs size 0 does not match the requires signature size 2 ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: “RuntimeError: Invalid data, Page size.”

Leave a reply

1 Error description

1.1 System Environment

1.2 Basic information

1.2.1 Script

This example converts the custom dataset into the MindSpore Record data format.

1.2.2 Error reporting

RuntimeError: Syntax error. Invalid data, Page size: 1048576 is too small to save a blob row. Please try to use the mindrecord api ‘set_page_size’ to enable 64MB page size.
Line of code : 1153

2 Reason analysis

According to the error message, the Pagesize setting is too small to read that much data.

3 Solutions

Refer to the provided set_page_size API to set the pagesize to a larger size.

4 Summary

1. Set the size of the page that represents the area where data is stored. These areas are divided into two types: row page and blob page. The larger the page, the more data can be stored.
2. When the pagesize is not set, the default sample size that can be stored is 32MB. If the sample size is larger than the default size, the user needs to call the API to set the appropriate size.
3. The adjustable range of pagesize is 32 1024 (32KB) to 256 1024*1024 (256MB).

[Solved] MindSpore Error: `seed2` in `StandardNormal` should be int and >=0

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script generates random numbers conforming to the normal distribution by constructing a single-operator network of StandardNormal. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self, seed=2, seed2=-3):
 03         super(Net, self).__init__()
 04         self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
 05     def construct(self, output_shape):
 06         output = self.standard_normal(output_shape)
 07         return output
 08 
 09 output_shape = (2, 3, 4)
 10 net = Net()
 11 output = net(output_shape)
 12 print("OUTPUT: ", output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I4DSWV-standardNormal.py", line 16, in <module>
    net = Net()
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I4DSWV-standardNormal.py", line 10, in __init__
    self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\ops\primitive.py", line 687, in deco
    fn(self, *args, **kwargs)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\ops\operations\random_ops.py", line 67, in __init__
    Validator.check_non_negative_int(seed2, "seed2", self.name)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 304, in check_non_negative_int
    return check_number(arg_value, 0, Rel.GE, int, arg_name, prim_name)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 168, in check_number
    raise type_except(f'{prim_info} should be {arg_type.__name__} and must {rel_str}, '
ValueError: `seed2` in `StandardNormal` should be int and must >= 0, but got `-3` with type `int`.

Cause Analysis

Let’s look at the error message. In ValueError, it is written that seed2 in StandardNormal should be int and must >= 0, but got -3 with type int, which means that the seed2 attribute in the StandardNormal operator must be an integer greater than or equal to 0, but a negative number is obtained. The official website explains the range of the two parameters seed and seed2:

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self, seed=2, seed2=3):
 03         super(Net, self).__init__()
 04         self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2)
 05     def construct(self, output_shape):
 06         output = self.standard_normal(output_shape)
 07         return output
 08 
 09 output_shape = (2, 3, 4)
 10 net = Net()
 11 output = net(output_shape)
 12 print("OUTPUT: ", output)

At this point, the execution is successful, and the output is as follows:

OUTPUT: [[[-0.7503836 -1.4105444 1.689283 1.0585287 ]
[ 0.19296396 -1.1577806 -0.9638309 -0.01727804]
[ 0.3017666 -1.1502111 -0.3734214 -0.4361166 ]]

[[ 0.7154948 0.6556154 2.3681476 1.1285974 ]
[-0.42168498 -0.2680572 -0.9242947 1.0003353 ]
[-0.4574307 0.36757398 -0.28976655 0.4996464 ]]]

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.standard_normal = ops.StandardNormal(seed=seed, seed2=seed2) ;

2. According to the keywords in the log error message, narrow the scope of the analysis problem t seed2 in StandardNormal should be int and must >= 0, but got ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: For ‘CellList’, each cell should be subclass of Cell

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script implements the cell list container by constructing a single-operator network of CellList. The script is as follows:

 01 class ListNoneExample(nn.Cell):
 02     def __init__(self):
 03         super(ListNoneExample, self).__init__()
 04         self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()])
 05 
 06     def construct(self, x):
 07         output = []
 08         for op in self.lst:
 09             output.append(op(x))
 10         return output
 11 
 12 input = Tensor(np.random.normal(0, 2, (2, 1)).astype(np.float32))
 13 example = ListNoneExample()
 14 output = example(input)
 15 print("Output:", output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3OGVW.py", line 31, in <module>
    example = ListNoneExample()
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3OGVW.py", line 19, in __init__
    self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()])
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 310, in __init__
    self.extend(args[0])
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 405, in extend
    if _valid_cell(cell, cls_name):
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\container.py", line 39, in _valid_cell
    raise TypeError(f'{msg_prefix} each cell should be subclass of Cell, but got {type(cell).__name__}.')
TypeError: For 'CellList', each cell should be subclass of Cell, but got NoneType.

Cause Analysis

Let’s look at the error message. In TypeError, write For ‘CellList’, each cell should be subclass of Cell, but got NoneType.
, which means that for the CellList operator, each incoming cell should be nn.Cell A subclass of , but gets the None type. Check line 4 of the behavior of initializing CellList in the network, and find that a None is passed in, so an error is reported. In order to solve this problem, just replace the None here with an object that inherits from the base class Cell class, and the same function can be achieved.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class NoneCell(nn.Cell):
 02     def __init__(self):
 03         super(NoneCell, self).__init__()
 04 
 05     def construct(self, x):
 06         return x
 07 
 08 class ListNoneExample(nn.Cell):
 09     def __init__(self):
 10         super(ListNoneExample, self).__init__()
 11         self.lst = nn.CellList([nn.ReLU(), NoneCell(), nn.ReLU()])
 12 
 13     def construct(self, x):
 14         output = []
 15         for op in self.lst:
 16             output.append(op(x))
 17         return output
 18 
 19 input = Tensor(np.random.normal(0, 2, (2, 1)).astype(np.float32))
 20 example = ListNoneExample()
 21 output = example(input)
 22 print("Output:", output)

At this point, the execution is successful, and the output is as follows:

Output: (Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [0.00000000e+000]]), Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [-2.74355006e+000]]), Tensor(shape=[2, 1], dtype=Float32, value=
[[1.09826946e+000],
 [0.00000000e+000]]))

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: self.lst = nn.CellList([nn.ReLU(), None, nn.ReLU()]) ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem each cell should be subclass of Cell, but got NoneType ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: For ‘MirrorPad‘, paddings must be a Tensor with *

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is done by building a single-operator network of MirrorPad that populates tensors with mirrored values. The script is as follows:

01  context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
02  class Net(nn.Cell):
03      def __init__(self):
04          super(Net, self).__init__()
05          self.pad = ops.MirrorPad(mode="REFLECT")
06      def construct(self, x, paddings):
07          return self.pad(x, paddings)
08  
09  x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
10  paddings = Tensor([[1, 1], [2, 2]])
11  pad = Net()
12  output = pad(x, paddings)
13  print(output.shape)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "99553.py", line 20, in <module>
    output = pad(x, paddings)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 956, in compile_and_run
    self.compile(*inputs)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/nn/cell.py", line 929, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/common/api.py", line 1063, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
  File "/root/miniconda3/envs/high_llj/lib/python3.7/site-packages/mindspore/ops/operations/nn_ops.py", line 4189, in __infer__
    raise ValueError(f"For '{self.name}', paddings must be a Tensor with type of int64, "
ValueError: For 'MirrorPad', paddings must be a Tensor with type of int64, but got None.

Cause Analysis

Let’s look at the error message. In ValueError, we write ValueError: For ‘MirrorPad’, paddings must be a Tensor with type of int64, but got None., which means that paddings must be a Tensor[INT64], but the actual result is None. , Combined with the 10th and 12th lines of the script, it is found that the paddings initialization value is obviously not None, which is why. In fact, this is because, in mindspore’s graph mode, the constant input must be passed in at the stage of composition, otherwise, it can only get None when it is executed. Ps. This problem does not exist in PYNATIVE_MODE.

2 Solutions

For the reasons known above, two modifications can be made to solve this problem.

# First, run the script under PYNATIVE_MODE.
from mindspore import context

context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU")
class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.pad = ops.MirrorPad(mode="REFLECT")
    def construct(self, x, paddings):
        return self.pad(x, paddings)

x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
paddings = Tensor([[1, 1], [2, 2]])
pad = Net()
output = pad(x, paddings)
print(output shape: output.shape)

At this point, the execution is successful, and the output is as follows:

output shape: (4, 7)

# In the second, the paddings are passed in as constants during composition.
class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.pad = ops.MirrorPad(mode="REFLECT")
        self.paddings = Tensor([[1, 1], [2, 2]])

    def construct(self, x):
        return self.pad(x, self.paddings)

x = Tensor(np.random.random(size=(2, 3)).astype(np.float32))
pad = Net()
output = pad(x)
print("output shape: "output.shape)

At this point, the execution is successful, and the output is as follows:

output shape: (4, 7)

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: output = pad(x, paddings);

2. ValueError: For ‘MirrorPad’, paddings must be a Tensor with type of int64, but got None;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: Data type conversion of ‘Parameter’ is not supporte

Leave a reply

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is to update the variable Tensor according to the AddSign algorithm by constructing a single operator network of ApplyPowerSign. The script is as follows:

 01 class Net(nn.Cell):
 02 	def __init__(self):
 03    		super(Net, self).__init__()
 04    		self.apply_power_sign = ops.ApplyPowerSign()
 05   		self.var = Parameter(Tensor(np.array([[0.6, 0.4],
 06                                         [0.1, 0.5]]).astype(np.float16)), name="var")
 07   		self.m = Parameter(Tensor(np.array([[0.6, 0.5],
 08                                       [0.2, 0.6]]).astype(np.float32)), name="m")
 09   		self.lr = 0.001
 10   		self.logbase = np.e
 11   		self.sign_decay = 0.99
 12   		self.beta = 0.9
 13 	def construct(self, grad):
 14   		out = self.apply_power_sign(self.var, self.m, self.lr, self.logbase,
 15                               self.sign_decay, self.beta, grad)
 16   		return out
 17 
 18 net = Net()
 19 grad = Tensor(np.array([[0.3, 0.7], [0.1, 0.8]]).astype(np.float32))
 20 output = net(grad)
 21 print(output)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "ApplyPowerSign.py", line 27, in <module>
    output = net(grad)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 956, in compile_and_run
    self.compile(*inputs)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 929, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1063, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: Data type conversion of 'Parameter' is not supported, so data type float16 cannot be converted to data type float32 automatically.
For more details, please refer at https://www.mindspore.cn/docs/zh-CN/master/note/operator_list_implicit.html.

Cause Analysis

Let’s look at the error message. In AttributeError, we write RuntimeError: Data type conversion of ‘Parameter’ is not supported, so data type float16 cannot be converted to data type float32 automatically, which means that the Parameter data type cannot be converted. This point about the use of Parameter is explained on the official website API:

Therefore, if Parameter is input as part of the network, its data type needs to be determined in the composition stage, and it is not supported to convert its data type in the middle.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02 	def __init__(self):
 03    		super(Net, self).__init__()
 04    		self.apply_power_sign = ops.ApplyPowerSign()
 05   		self.var = Parameter(Tensor(np.array([[0.6, 0.4],
 06                                         [0.1, 0.5]]).astype(np.float32)), name="var")
 07   		self.m = Parameter(Tensor(np.array([[0.6, 0.5],
 08                                       [0.2, 0.6]]).astype(np.float32)), name="m")
 09   		self.lr = 0.001
 10   		self.logbase = np.e
 11   		self.sign_decay = 0.99
 12   		self.beta = 0.9
 13 	def construct(self, grad):
 14   		out = self.apply_power_sign(self.var, self.m, self.lr, self.logbase,
 15                               self.sign_decay, self.beta, grad)
 16   		return out
 17 
 18 net = Net()
 19 grad = Tensor(np.array([[0.3, 0.7], [0.1, 0.8]]).astype(np.float32))
 20 output = net(grad)
 21 print(output)

At this point, the execution is successful, and the output is as follows:

output: (Tensor(shape=[2, 2], dtype=Float32, value=
[[ 5.95575690e-01,  3.89676481e-01],
 [ 9.85252112e-02,  4.88201708e-01]]), Tensor(shape=[2, 2], dtype=Float32, 
 value=[[ 5.70000052e-01,  5.19999981e-01],
 [ 1.89999998e-01,  6.20000064e-01]]))

3 Summary

Steps to locate the error report:

1. Find the line of user code that reported the error: 20 output = net(grad) ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem. Data type conversion of ‘Parameter’ is not supported, so data type float16 cannot be converted to data type float32 automatically. ;

3. It is necessary to focus on the correctness of variable definition and initialization.

[Solved] MindSpore Error: Should not use Python in runtime

Leave a reply

1 Error description

1.1 System Environment

ardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.7.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

This case is to use the MindSpore JIT Fallback function to call the logic of using Numpy in mindspore

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor, ms_function, context

context.set_context(mode=context.GRAPH_MODE)

def test_validate():
  @ms_function
  def Func():
    x = np.array([1])
    if x >= 1:
      x = x * 2
    return x
  res = Func()
  print("res:", res)

1.2.2 Error reporting

RuntimeError: mindspore/ccsrc/pipeline/jit/validator.cc:141 ValidateValueNode] Should not use Python object in runtime, node: ValueNode<InterpretedObject> InterpretedObject: '[2]'.
Line: In file /home/llg/workspace/mindspore/mindspore/test.py(15)
E               if x >= 1:

2 Reason analysis and solution

This is because the MindSpore compiler found that there were still some nodes of interpretation type inside the function when it checked after the compilation, resulting in an error. After viewing the code, it is found that the function returns a numpy type of data, which has not been converted into MindSpore’s Tensor, so that it cannot enter the back-end runtime for calculation, resulting in an error. You can wrap the numpy array into a Tensor and calculate it before returning it. Use Tensor.asnumpy() outside the function to convert it to numpy array type data, and perform other numpy related operations.

3 Summary

MindSpore’s functions operate on numpy types through JIT Fallback, which can only be deduced and executed at compile time and cannot be passed to runtime. And it cannot be returned as the final function return value, otherwise it will cause an error when passed to the runtime. You can wrap a numpy array into a Tensor and return it. Use Tensor.asnumpy() outside the function to convert it to numpy array type data, and perform other numpy related operations.

[Solved] MindSpore Error: ValueError: `padding_idx` in `Embedding` out of range

Leave a reply

1 Error description

1.1 System Environment

1.2 Basic information

1.2.1 Script

The training script completes the embedding layer operation by constructing a single-operator network of Embedding. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self, vocab_size, embedding_size, use_one_hot, padding_idx=None):
 03         super(Net, self).__init__()
 04         self.op = nn.Embedding(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=padding_idx)
 05 
 06     def construct(self, x):
 07         output = self.op(x)
 08         return output
 09 
 10 input = Tensor(np.ones([8, 128]), mindspore.int32)
 11 vocab_size = 2000
 12 embedding_size = 768
 13 use_one_hot = True
 14 example = Net(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=10000)
 15 output = example(input)
 16 print("Output:", output.shape)

1.2.2 Error reporting

The error message here is as follows:

Traceback (most recent call last):
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3MRK3.py", line 26, in <module>
    example = Net(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=10000)
  File "C:/Users/l30026544/PycharmProjects/q2_map/new/I3MRK3.py", line 12, in __init__
    self.op = nn.Embedding(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=padding_idx)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\layer\embedding.py", line 111, in __init__
    self.padding_idx = validator.check_int_range(padding_idx, 0, vocab_size, Rel.INC_BOTH,
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 413, in check_int_range
    return check_number_range(arg_value, lower_limit, upper_limit, rel, int, arg_name, prim_name)
  File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 209, in check_number_range
    raise ValueError("{} {} should be in range of {}, but got {} with type `{}`.".format(
ValueError: `padding_idx` in `Embedding` should be in range of [0, 2000], but got 10000 with type `int`.

Cause Analysis

Let’s look at the error message. In ValueError, it is written that padding_idx in Embedding should be in range of [0, 2000], but got 10000 with type int., which means that the value of padding_idx’ in the Embedding operator needs to be between 0 and 2000, but Got 10000. Combined with the official website’s description of the usage of the Ebedding operator, it is found that padding_idx has been clearly specified, and its value needs to be between 0 and vocab_size:

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self, vocab_size, embedding_size, use_one_hot, padding_idx=None):
 03         super(Net, self).__init__()
 04         self.op = nn.Embedding(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=padding_idx)
 05 
 06     def construct(self, x):
 07         output = self.op(x)
 08         return output
 09 
 10 input = Tensor(np.ones([8, 128]), mindspore.int32)
 11 vocab_size = 2000
 12 embedding_size = 768
 13 use_one_hot = True
 14 example = Net(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=1000)
 15 output = example(input)
 16 print("Output:", output.shape)

At this point, the execution is successful, and the output is as follows:

Output: (8, 128, 768)

3 Summary

Steps to locate the error report:

1. Find the user code line that reported the error: example = Net(vocab_size=vocab_size, embedding_size=embedding_size, use_one_hot=use_one_hot, padding_idx=10000) ;

2. According to the keywords in the log error message, narrow the scope of the analysis problem V padding_idx in Embedding should be in range of [0, 2000], but got 10000 with type int. ;

[Solved] MindSpore Error: “RuntimeError: Unable to data from Generator..”

Leave a reply

1 Error description

1.1 System Environment

1.2 Basic information

1.2.1 Script

This case uses a custom iterable data set for training. During the training process, the first epoch data is iterated normally, and the second epoch will report an error. The custom data code is as follows:

import numpy as np
import mindspore.dataset as ds
from tqdm import tqdm

class IterDatasetGenerator:
    def __init__(self, datax, datay, classes_per_it, num_samples, iterations):
        self.__iterations = iterations
        self.__data = datax
        self.__labels = datay
        self.__iter = 0
        self.classes_per_it = classes_per_it
        self.sample_per_class = num_samples
        self.classes, self.counts = np.unique(self.__labels, return_counts=True)
        self.idxs = range(len(self.__labels))
        self.indexes = np.empty((len(self.classes), max(self.counts)), dtype=int) * np.nan
        self.numel_per_class = np.zeros_like(self.classes)
        for idx, label in tqdm(enumerate(self.__labels)):
            label_idx = np.argwhere(self.classes == label).item()
            self.indexes[label_idx, np.where(np.isnan(self.indexes[label_idx]))[0][0]] = idx
            self.numel_per_class[label_idx] = int(self.numel_per_class[label_idx]) + 1

    def __next__(self):
        spc = self.sample_per_class
        cpi = self.classes_per_it

        if self.__iter >= self.__iterations:
            raise StopIteration
        else:
            batch_size = spc * cpi
            batch = np.random.randint(low=batch_size, high=10 * batch_size, size=(batch_size), dtype=np.int64)
            c_idxs = np.random.permutation(len(self.classes))[:cpi]
            for i, c in enumerate(self.classes[c_idxs]):
                index = i*spc
                ci = [c_i for c_i in range(len(self.classes)) if self.classes[c_i] == c][0]
                label_idx = list(range(len(self.classes)))[ci]
                sample_idxs = np.random.permutation(int(self.numel_per_class[label_idx]))[:spc]
                ind = 0
                for i in sample_idxs:
                    batch[index+ind] = self.indexes[label_idx]
                    ind = ind + 1
            batch = batch[np.random.permutation(len(batch))]
            data_x = []
            data_y = []
            for b in batch:
                data_x.append(self.__data<b>)
                data_y.append(self.__labels<b>)
            self.__iter += 1
            item = (data_x, data_y)
            return item

    def __iter__(self):
        return self

    def __len__(self):
        return self.__iterations

np.random.seed(58)
data1 = np.random.sample((500,2))
data2 = np.random.sample((500,1))
dataset_generator  = IterDatasetGenerator(data1,data2,5,10,10)
dataset = ds.GeneratorDataset(dataset_generator,["data","label"],shuffle=False)
epochs=3
for epoch in range(epochs):
    for data in dataset.create_dict_iterator():
        print("success")

fold

1.2.2 Error reporting

Error message: RuntimeError: Exception thrown from PyFunc. Unable to fetch data from GeneratorDataset, try iterate the source function of GeneratorDataset or check value of num_epochs when create iterator.

2 Reason analysis

In the process of each data iteration, self.__iter will accumulate. When the second epoch is prefetched, self.__iter has accumulated to the value of the set iterations, resulting in self.__iter >= self.__iterations, and the loop ends.

3 Solutions

Add the clearing operation to def iter(self): and set self.__iter = 0.

The execution is successful at this time, and the output is as follows: