traveller59/second.pytorch

train.py----->"TypeError: 'numpy.float64' object cannot be interpreted as an integer" and "TypeError: Object of type 'ndarray' is not JSON serializable "

KangChou opened this issue ยท 12 comments

`
runtime.step=2250, runtime.steptime=0.2624, runtime.voxel_gene_time=0.001374, runtime.prep_time=0.05496, loss.cls_loss=0.2967, loss.cls_loss_rt=0.3549, loss.loc_loss=0.5239, loss.loc_loss_rt=0.5036, loss.loc_elem=[0.008796, 0.01199, 0.1004, 0.0109, 0.03556, 0.01613, 0.06806], loss.cls_pos_rt=0.2487, loss.cls_neg_rt=0.1062, loss.dir_rt=0.4927, rpn_acc=0.9993, pr.prec@10=0.0732, pr.rec@10=0.8572, pr.prec@30=0.5539, pr.rec@30=0.5002, pr.prec@50=0.9257, pr.rec@50=0.1765, pr.prec@70=0.995, pr.rec@70=0.004043, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=30752, misc.num_pos=60, misc.num_neg=70263, misc.num_anchors=70400, misc.lr=0.0005046, misc.mem_usage=25.9
runtime.step=2300, runtime.steptime=0.2481, runtime.voxel_gene_time=0.001561, runtime.prep_time=0.0813, loss.cls_loss=0.2928, loss.cls_loss_rt=0.2211, loss.loc_loss=0.5189, loss.loc_loss_rt=0.3877, loss.loc_elem=[0.008733, 0.009574, 0.02625, 0.01872, 0.04538, 0.0329, 0.05227], loss.cls_pos_rt=0.1724, loss.cls_neg_rt=0.04868, loss.dir_rt=0.5206, rpn_acc=0.9993, pr.prec@10=0.07425, pr.rec@10=0.8598, pr.prec@30=0.5595, pr.rec@30=0.5073, pr.prec@50=0.9269, pr.rec@50=0.1837, pr.prec@70=0.9964, pr.rec@70=0.004993, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=34000, misc.num_pos=59, misc.num_neg=70244, misc.num_anchors=70400, misc.lr=0.0005165, misc.mem_usage=25.9
#################################

EVAL

#################################
Generate output labels...
[100.0%][===================>][20.00it/s][01:44>00:00]
generate label finished(35.85/s). start eval:
/opt/conda/lib/python3.6/site-packages/numba/core/typed_passes.py:327: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../utils/eval.py", line 129:
@numba.jit(nopython=True, parallel=True)
def box3d_overlap_kernel(boxes,
^

state.func_ir.loc))
Traceback (most recent call last):
File "/data/second.pytorch/second/pytorch/train.py", line 407, in train
detections, str(result_path_step))
File "/data/second.pytorch/second/data/kitti_dataset.py", line 149, in evaluation
z_center=z_center)
File "/data/second.pytorch/second/utils/eval.py", line 884, in get_coco_eval_result
z_center=z_center)
File "/data//second.pytorch/second/utils/eval.py", line 704, in do_coco_style_eval
min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])
File "<array_function internals>", line 6, in linspace
File "/opt/conda/lib/python3.6/site-packages/numpy/core/function_base.py", line 113, in linspace
num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/second.pytorch/second/pytorch/train.py", line 678, in
resume=False)
File "/second.pytorch/second/pytorch/train.py", line 421, in train
print(json.dumps(example["metadata"], indent=2))
File "/opt/conda/lib/python3.6/json/init.py", line 238, in dumps
**kw).encode(obj)
File "/opt/conda/lib/python3.6/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/opt/conda/lib/python3.6/json/encoder.py", line 428, in _iterencode
yield from _iterencode_list(o, _current_indent_level)
File "/opt/conda/lib/python3.6/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/opt/conda/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
yield from chunks
File "/opt/conda/lib/python3.6/json/encoder.py", line 437, in _iterencode
o = _default(o)
File "/opt/conda/lib/python3.6/json/encoder.py", line 180, in default
o.class.name)
TypeError: Object of type 'ndarray' is not JSON serializable

`

I'm experiencing similar issue.
It reproduced on docker.
Dockerfile.txt

+ python ./pytorch/train.py evaluate --config_path=/mnt/host/vol/second.pytorch/second/configs/all.fhd.config.fixed --model_dir=/mnt/host/vol/second.pytorch/second/configs/pointpillars/model_result --measure_time=True --batch_size=1
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
  warnings.warn(errors.NumbaWarning(msg))
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice.

For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
  warnings.warn(errors.NumbaWarning(msg))
[  41 1280 1056]
feature_map_size [1, 160, 132]
remain number of infos: 3769
Generate output labels...
[100.0%][===================>][10.32it/s][06:16>00:00]
generate label finished(9.99/s). start eval:
avg example to torch time: 5.093 ms
avg prep time: 6.488 ms
avg voxel_feature_extractor time = 0.494 ms
avg middle forward time = 50.746 ms
avg rpn forward time = 13.280 ms
avg predict time = 23.484 ms
/miniconda/envs/py38/lib/python3.8/site-packages/numba/core/typed_passes.py:326: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "utils/eval.py", line 129:
@numba.jit(nopython=True, parallel=True)
def box3d_overlap_kernel(boxes,
^

  warnings.warn(errors.NumbaPerformanceWarning(msg,
Traceback (most recent call last):
  File "./pytorch/train.py", line 663, in <module>
    fire.Fire()
  File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "./pytorch/train.py", line 540, in evaluate
    result_dict = eval_dataset.dataset.evaluation(detections,
  File "/mnt/host/vol/second.pytorch/second/data/kitti_dataset.py", line 144, in evaluation
    result_coco = get_coco_eval_result(
  File "/mnt/host/vol/second.pytorch/second/utils/eval.py", line 877, in get_coco_eval_result
    mAPbbox, mAPbev, mAP3d, mAPaos = do_coco_style_eval(
  File "/mnt/host/vol/second.pytorch/second/utils/eval.py", line 704, in do_coco_style_eval
    min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])
  File "<__array_function__ internals>", line 5, in linspace
  File "/miniconda/envs/py38/lib/python3.8/site-packages/numpy/core/function_base.py", line 120, in linspace
    num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

this occur because of numpy version.
if you may install numpy==1.17.4, this error will not occur

Thank you for your advice !
Today I downgraded numpy with 1.16 also according to this article.
Specifically, I added following 1 line to the tail of Dockerfile

RUN conda install -c conda-forge numpy=1.16.2

And the error I experienced as I posted was solved.

However new error has appeared instead of it.

[  41 1280 1056]
feature_map_size [1, 160, 132]
remain number of infos: 3769
Generate output labels...
[7.005%][>...................][0.31it/s][03:03>03:06:20]
Traceback (most recent call last):0it/s][04:36>04:54:37]
  File "./pytorch/train.py", line 663, in <module>
    fire.Fire()
  File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "./pytorch/train.py", line 524, in evaluate
    detections += net(example)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/host/vol/second.pytorch/second/pytorch/models/voxelnet.py", line 363, in forward
    preds_dict = self.network_forward(voxels, num_points, coors, batch_size_dev)
  File "/mnt/host/vol/second.pytorch/second/pytorch/models/voxelnet.py", line 332, in network_forward
    voxel_features, coors, batch_size)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/host/vol/second.pytorch/second/pytorch/models/middle.py", line 203, in forward
    ret = self.middle_conv(ret)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/modules.py", line 133, in forward
    input = module(input)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/conv.py", line 192, in forward
    outids.shape[0])
  File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
    return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
  File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/ops.py", line 116, in indice_conv
    int(inverse), int(subm))
  File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()

oh... sorry to hear that

new issue is issue that I was not faced.
sorry to much

No worries. Your advice solved the one of problems I'm facing. Thank you !

I realized elapsed time to crash from execute is vary.
Perhaps this problem may be caused by my poor environment..
Corei3 1st gen, GTX 1050 , and even only 4GB RAM

I found this Japanese article which tells the solution of problem is increasing shared-memory by write shm_size: '2gb' on the docker-compose.yml.
This solution seems to work so far (Still executing without crash).

It worked ! (Almost of 0.00 but this may be caused by different kind of issue )

Evaluation official
Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:0.00, 0.00, 0.00
bev  AP:0.00, 0.00, 0.00
3d   AP:0.00, 0.00, 0.00
aos  AP:0.00, 0.00, 0.00
Car AP(Average Precision)@0.70, 0.50, 0.50:
      . . . 
Van coco AP@0.50:0.05:0.95:
bbox AP:0.00, 0.00, 0.00
bev  AP:0.00, 0.00, 0.00
3d   AP:0.00, 0.00, 0.00
aos  AP:0.00, 0.00, 0.00

This issue occurred because memory out, is right?

I guess so. It worked after add shm_size: '2gb' to the docker-compose.yml . I didn't change other part of docker-compose.yml.txt and Dockerfile.txt

I guess so. It worked after add shm_size: '2gb' to the docker-compose.yml . I didn't change other part of docker-compose.yml.txt and Dockerfile.txt

Hi, I found a similar error. How to do that in a non-docker environment? I run it in my computer with one RTX 2060 in Ubuntu 18.04

You can change the evel.py file instead of changing the numpy version:
/data//second.pytorch/second/utils/eval.py
Line 704:

for i in range(overlap_ranges.shape[1]):
    for j in range(overlap_ranges.shape[2]):
        a, b, c = overlap_ranges[:, i, j]  # extracting the three numbers
        min_overlaps[:, i, j] = np.linspace(a, b, int(c))
        # min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])

like this. This might solve the problem.

It worked ! (Almost of 0.00 but this may be caused by different kind of issue )

Evaluation official
Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:0.00, 0.00, 0.00
bev  AP:0.00, 0.00, 0.00
3d   AP:0.00, 0.00, 0.00
aos  AP:0.00, 0.00, 0.00
Car AP(Average Precision)@0.70, 0.50, 0.50:
      . . . 
Van coco AP@0.50:0.05:0.95:
bbox AP:0.00, 0.00, 0.00
bev  AP:0.00, 0.00, 0.00
3d   AP:0.00, 0.00, 0.00
aos  AP:0.00, 0.00, 0.00

Hi @WesternHill ,

Were you able to solve eval 0.00 problem? Did you get any evaluation? I am facing the same problem.

If you are running any Object detection and facing this issue, it is because of version conflicts in 'pycocotools'. Uninstall and reinstall it, your problem will be solved.
pip uninstall pycocotools
pip install pycocotools