[Solved] MindSpore Error: task_fail_info or current_graph_ is nullptr

1 Error description

1.1 System Environment

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (eg, Python 3.7.5): 3.7.6
– OS platform and distribution (eg, Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 Basic information

1.2.1 Script

The training script is to perform a greedy decoding (best path) on the logits given in the input by building a single-operator network of CTC GreedyDecoder. The script is as follows:

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
 05 
 06     def construct(self, input_x, sequence_length):
 07         return self.ctc_greedyDecoder(input_x, sequence_length)
 08 net = Net()
 09 
 10 
 11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
 12                           [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
 13 sequence_length = Tensor(np.array([4, 2]), mindspore.int32)
 14 
 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
 16 print(decoded_indices, decoded_values, decoded_shape, log_probability)

1.2.2 Error reporting

The error message here is as follows:

[ERROR] DEVICE(172230,fffeae7fc160,python):2022-06-28-07:02:12.636.101 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:603] TaskFailCallback] Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr
Traceback (most recent call last):
  File "CTCGreedyDecoder.py", line 26, in <module>
    decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 979, in compile_and_run
    return _cell_graph_executor(self, *new_inputs, phase=self.phase)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1128, in __call__
    return self.run(obj, *args, phase=phase)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1165, in run
    return self._exec_pip(obj, *args, phase=phase_real)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 94, in wrapper
    results = fn(*arg, **kwargs)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in _exec_pip
    return self._graph_executor(args, phase)
RuntimeError: Call runtime rtStreamSynchronize failed. Op name: Default/CTCGreedyDecoder-op2

Cause Analysis

Let’s look at the error message. In Error, it is written that Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr. Although it is not very clear from this error message where the problem is, you can extract the keywords inside for guess verification. There is a nullptr in it, which may be caused by out of bounds. Then carefully check the description of each parameter on the official website,
image.png

Combined with line 13 of the script, it is found that this condition is not satisfied, so an error is reported.

2 Solutions

For the reasons known above, it is easy to make the following modifications:

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
 05 
 06     def construct(self, input_x, sequence_length):
 07         return self.ctc_greedyDecoder(input_x, sequence_length)
 08 net = Net()
 09 
 10 
 11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
 12                           [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
 13 sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
 14 
 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
 16 print(decoded_indices, decoded_values, decoded_shape, log_probability)

At this point, the execution is successful, and the output is as follows:

[[0 0]
 [0 1]
 [1 0]] [0 1 0] [2 2] [[-1.2]
 [-1.3]]

3 Summary

Steps to locate the error report:

1. Find the line of user code that reports the error: 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length) ;

2. According to the keywords in the log error message, narrow down the scope of the analysis problem* Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr* ;

Read More:

Leave a Reply

Your email address will not be published. Required fields are marked *