I have some questions about how to build the Dockerfile.
I tried in a first step to build both "" and "Dockerfile.server" files, the "build" one build correctly however when i try to run it. It close directly, is it normal ?
Moreover i can't build the Dockerfile.server because of credential. I have the credential but i don't know how to put it in the code and if i try to connect using the url : "". I end up with :
"{"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":{"Tag":"latest"}}]}"
Can u help me to build and run correctly those Dockerfile?
In fact I'm not that in the server, I would like to know if it's possible to just build the dockerfile and do inference using models downloaded from "" (somehow it's the joligan server so i already can get models from there). I am more interested in the and how to run it . Thanks
I would like to know if it's possible to just build the dockerfile and do inference using models
Yes you can do this, though the build
docker is not exactly designed for this, as follows:
nvidia-docker run -v /path/to/models/:/models/ -v /path/to/images/:/images/ --rm --gpus all -it --entrypoint bash jolibrain/joligan_build
This gets you a running docker with a root user inside it. The -v
mounts the local path to models to /models/
inside the docker, and the path to images to /images/
inside the docker.
From there you can use inference, e.g.
cd scripts
python3 --model-in-file /models/xxx/latest_net_G_A.pth --img-in /images/xxx.png --img-out /path/to/out/image.png
I resolve my problem for using the by using "tail -f /dev/null" after docker run.
1 problem remain, I tried to launch an inference using a model of the joligan server with the command :
--model-in-file /app/pretrained_weights_models/bdd100k_weather_det_clear2snowy_mm1/latest_net_G_A.pth
--img-size 512
--img-in /app/sample_bdd100k_img/8221f03e-7a27e32f.jpg
--img-out 8221f03e-7a27e32f_snowy.jpg
--gpuid 1
I received a cuda error : "invalid ordinal device" but this error seems to come form my docker/cuda.
So I tried to use the cpu instead (for inference)by replacing "--gpuid" by "--cpu" according to" argument description to avoid this error but it returns 'name "device" is not defined'.
but it returns 'name "device" is not defined'.
This is a bug, I just fixed it on master, see bb3c70c
I received a cuda error : "invalid ordinal device" but this error seems to come form my docker/cuda.
try nvidia-smi
and make sure you have two GPUs available since your are asking GPU 1 (0 should be the first one).
Hi, I still have some problem with the script, I did build the dockerfile, and when i tried the command:
" python3 --model-in-file /app/pretrained_weights_models/bdd100k_weather_det_clear2snowy_mm1/latest_net_G_A.pth --img-in /app/sample_bdd100k_img/val/8221f03e-7a27e32f.jpg --img-out 8221f03e-7a27e32f_snowy.jpg --gpuid 1"
i got the following error:
"Traceback (most recent call last):
File "", line 60, in
model, opt = load_model(modelpath, os.path.basename(args.model_in_file), device)
File "", line 28, in load_model
opt = TrainOptions().parse_json(train_json)
File "/app/scripts/../options/", line 925, in parse_json
self._json_parse_known_args(parser, opt, flat_json)
File "/app/scripts/../options/", line 882, in _json_parse_known_args
raise ValueError(
ValueError: data_online_creation_mask_delta_A: Bad type (<class 'int'>, should be list of <class 'int'>)"
I already replace "cut_semantic_mask" by cut in the train_config.json of the model "bdd100k_weather_det_clear2snowy_mm1" downloaded on the joligan server. It seem the problem comes from but i can't find what to change.
I think the problem may comes from the train_config.json, I'll put it bellow.
I received a cuda error : "invalid ordinal device" but this error seems to come form my docker/cuda.
and make sure you have two GPUs available since your are asking GPU 1 (0 should be the first one).
I already checked and both GPU 1 and 2 are shown in nvidia-smi, for a unknown reason, it seems the problem resolved by itself but another one occured. The error mention just before "data _online_mask_delta"
"D": {
"dropout": false,
"n_layers": 3,
"ndf": 64,
"netDs": [
"no_antialias": false,
"no_antialias_up": false,
"norm": "instance",
"proj_config_segformer": "models/configs/segformer/",
"proj_interp": 512,
"proj_network_type": "vitsmall",
"proj_weight_segformer": "models/configs/segformer/pretrain/segformer_mit-b0.pth",
"spectral": false,
"temporal_every": 4,
"temporal_frame_step": 30,
"temporal_num_common_char": -1,
"temporal_number_frames": 5,
"vision_aided_backbones": "clip+dino"
"G": {
"attn_nb_mask_attn": 10,
"attn_nb_mask_input": 1,
"backward_compatibility_twice_resnet_blocks": false,
"config_segformer": "models/configs/segformer/",
"dropout": false,
"netE": "resnet_512",
"netG": "segformer_attn_conv",
"ngf": 64,
"norm": "instance",
"padding_type": "reflect",
"spectral": false,
"stylegan2_num_downsampling": 1
"alg": {
"cut": {
"flip_equivariance": false,
"lambda_GAN": 1.0,
"lambda_NCE": 1.0,
"nce_T": 0.07,
"nce_idt": true,
"nce_includes_all_negatives_from_minibatch": false,
"nce_layers": "0,4,8,12,16",
"netF": "mlp_sample",
"netF_dropout": false,
"netF_nc": 256,
"netF_norm": "instance",
"num_patches": 256
"cyclegan": {},
"re": {
"P_lr": 0.0002,
"adversarial_loss_p": false,
"netP": "unet_128",
"no_train_P_fake_images": false,
"nuplet_size": 3,
"projection_threshold": 1.0
"data": {
"online_creation": {
"crop_delta_A": 64,
"crop_delta_B": 64,
"crop_size_A": 512,
"crop_size_B": 512,
"mask_delta_A": 0,
"mask_delta_B": 0,
"mask_square_A": false,
"mask_square_B": false
"crop_size": 512,
"dataset_mode": "unaligned_labeled_mask_online",
"direction": "AtoB",
"load_size": 512,
"max_dataset_size": 1000000000,
"num_threads": 4,
"online_context_pixels": 0,
"preprocess": "resize_and_crop",
"relative_paths": false,
"sanitize_paths": false,
"serial_batches": false
"f_s": {
"all_classes_as_one": false,
"class_weights": [
"config_segformer": "models/configs/segformer/",
"dropout": false,
"net": "segformer",
"nf": 64,
"semantic_nclasses": 11,
"semantic_threshold": 1.0,
"weight_segformer": ""
"output": {
"display": {
"G_attention_masks": false,
"diff_fake_real": false,
"env": "bdd100k_weather_det_clear2snowy_mm1",
"freq": 200,
"id": 1,
"ncols": 4,
"networks": false,
"port": 8097,
"server": "http://localhost",
"winsize": 256
"no_html": false,
"print_freq": 200,
"update_html_freq": 1000,
"verbose": false
"model": {
"init_gain": 0.02,
"init_type": "normal",
"input_nc": 3,
"multimodal": true,
"output_nc": 3
"train": {
"sem": {
"cls_B": false,
"cls_pretrained": false,
"cls_template": "basic",
"idt": true,
"l1_regression": false,
"lambda": 1.0,
"lr_f_s": 0.0002,
"net_output": false,
"regression": false,
"use_label_B": true
"mask": {
"charbonnier_eps": 1e-06,
"disjoint_f_s": false,
"f_s_B": true,
"for_removal": false,
"lambda_out_mask": 10.0,
"loss_out_mask": "L1",
"no_train_f_s_A": false,
"out_mask": false
"D_accuracy_every": 1000,
"D_lr": 0.0001,
"G_ema": true,
"G_ema_beta": 0.999,
"G_lr": 0.0002,
"batch_size": 2,
"beta1": 0.9,
"beta2": 0.999,
"compute_D_accuracy": false,
"compute_fid": false,
"compute_fid_val": false,
"continue": false,
"epoch": "latest",
"epoch_count": 1,
"fid_every": 1000,
"gan_mode": "lsgan",
"iter_size": 4,
"load_iter": 0,
"lr_decay_iters": 50,
"lr_policy": "linear",
"mm_lambda_z": 0.5,
"mm_nz": 16,
"n_epochs": 100,
"n_epochs_decay": 100,
"nb_img_max_fid": 1000000000,
"optim": "adam",
"pool_size": 50,
"save_by_iter": false,
"save_epoch_freq": 1,
"save_latest_freq": 5000,
"use_contrastive_loss_D": false
"dataaug": {
"APA": false,
"APA_every": 4,
"APA_nimg": 50,
"APA_p": 0,
"APA_target": 0.6,
"D_label_smooth": false,
"D_noise": 0.01,
"affine": 0.0,
"affine_scale_max": 1.2,
"affine_scale_min": 0.8,
"affine_shear": 45,
"affine_translate": 0.2,
"diff_aug_policy": "",
"diff_aug_proba": 0.5,
"imgaug": false,
"no_flip": false,
"no_rotate": true
"checkpoints_dir": "/data1/confiance_platform/checkpoints/",
"dataroot": "/data1/confiance/datasets/bdd100k_weather_clear2snowy/",
"ddp_port": "13456",
"gpu_ids": "2",
"model_type": "cut",
"name": "bdd100k_weather_det_clear2snowy_mm1",
"phase": "train",
"suffix": "",
"warning_mode": false
The error mention just before
data _online_mask_delta
This is because the option has changed, you can fix it easily by editing the train_config.json
file to set:
"mask_delta_A": [0],
"mask_delta_B": [0]
We had to do it ourselves on other models as well.
my problem with cuda is not gone, i think the problem comes from my set up even though I build the dockerfile.
I have tested if my gpus were available by adding the follwing lines in script :
modelpath = args.model_in_file.replace(os.path.basename(args.model_in_file), "")
print("modelpath=", modelpath)
use_cuda = torch.cuda.is_available()
print("cuda device is availaible :",use_cuda)
and it's seems that it's ok, if u have already see something like this, can u give me a tip to correct it. Thanks
I have the same error when i use "--cpu" argument
If you haven´t done so yet, you shall rebuild the docker image so that it runs the latest code. Or you can patch from within the docker, as you like best.
can u give me a tip to correct it
First, make sure nvidia-smi
works correctly from inside the docker, and look at the list of GPUs.
, and then use --gpuid 0
. You may have to set the env variable into the dockerfile as well...
I checked several things and i still can't find why I have this cuda error : invalid device ordinal,
nvidia-smi worked well, i can get my gpu names and id with torch.
Using "export CUDA_VISIBLE_DEVICES=1" didn't solve the problem.
I also tried to change versions of modules but i still got the same error.
This error also occur when i use "--cpu" argument from
i currently use :
python 3.9.13
torch 1.12.1+cu116
torchvision 0.13.1+cu116
cuda version (nvidia-smi) : 11.8
May I know what is your config when u run script?
I'll try to reproduce it. Thanks
Hi @YoannRandon ,
#322 should solve your issue, please let us know if you still have any problem.
@YoannRandon you need to rebuild your docker though.