1 Error description
1.1 System Environment
Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
1.2 Basic information
1.2.1 Script
The training script is to perform a greedy decoding (best path) on the logits given in the input by building a single-operator network of CTC GreedyDecoder. The script is as follows:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
05
06 def construct(self, input_x, sequence_length):
07 return self.ctc_greedyDecoder(input_x, sequence_length)
08 net = Net()
09
10
11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
12 [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
13 sequence_length = Tensor(np.array([4, 2]), mindspore.int32)
14
15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
16 print(decoded_indices, decoded_values, decoded_shape, log_probability)
1.2.2 Error reporting
The error message here is as follows:
[ERROR] DEVICE(172230,fffeae7fc160,python):2022-06-28-07:02:12.636.101 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:603] TaskFailCallback] Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr
Traceback (most recent call last):
File "CTCGreedyDecoder.py", line 26, in <module>
decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
out = self.compile_and_run(*args)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 979, in compile_and_run
return _cell_graph_executor(self, *new_inputs, phase=self.phase)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1128, in __call__
return self.run(obj, *args, phase=phase)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1165, in run
return self._exec_pip(obj, *args, phase=phase_real)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 94, in wrapper
results = fn(*arg, **kwargs)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in _exec_pip
return self._graph_executor(args, phase)
RuntimeError: Call runtime rtStreamSynchronize failed. Op name: Default/CTCGreedyDecoder-op2
Cause Analysis
Let’s look at the error message. In Error, it is written that Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr. Although it is not very clear from this error message where the problem is, you can extract the keywords inside for guess verification. There is a nullptr in it, which may be caused by out of bounds. Then carefully check the description of each parameter on the official website,
Combined with line 13 of the script, it is found that this condition is not satisfied, so an error is reported.
2 Solutions
For the reasons known above, it is easy to make the following modifications:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
05
06 def construct(self, input_x, sequence_length):
07 return self.ctc_greedyDecoder(input_x, sequence_length)
08 net = Net()
09
10
11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
12 [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
13 sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
14
15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
16 print(decoded_indices, decoded_values, decoded_shape, log_probability)
At this point, the execution is successful, and the output is as follows:
[[0 0]
[0 1]
[1 0]] [0 1 0] [2 2] [[-1.2]
[-1.3]]
3 Summary
Steps to locate the error report:
1. Find the line of user code that reports the error: 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length) ;
2. According to the keywords in the log error message, narrow down the scope of the analysis problem* Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr* ;