svip-lab/PPGNet

The line_pred visualized on nothing.

Closed this issue · 17 comments

Hey, thanks for your nice work.
I wanna try your work to see whether it works in my pics. First, I need to train it, I started by train.sh.
But when I use tensorboard to visualize the junctions and lines, it seems that line_pred output nothing like below. Is that normal?
image
image

Thanks!

The AMIM, which infers the adjacency matrix of junctions, is hard to train, and need lots of epochs to output reasonable results. If you observed dense connections between junctions in the first few epochs and then found all line segments just disappeared, then the training and visualization part should work fine. Because the adjacency matrices are very sparse, the AMIM will first learn to output close-to-zero matrices. If you want to see some results in the early stage of training, you can try to lower the threshold of adjacency matrices for visualization by changing the --vis-line-th parameter in train.sh. But be aware that if you set the threshold to be too low, you will also see a mass of false-positive lines.

Thanks for your reply. It seems like what you said.
However, in the early training process, the junction_pred is working to correct direction, while no line_pred.
Now I'm at 18 epoch, while the line_pred has mass output(most of them are not correct), the junctions just find 0 points which prob higher than threshold.
Seems like this task is difficult and need much patience to get good result. If it is OK could you tell how much time your team used to train this network? Thanks again.

I believe that 18 epochs should be enough to get some reasonable results. It seems that the AMIM was not trained correctly somehow, but I didn't know where the problem is. I wanted to train the network again using the code in this repo. However, I just found that I couldn't access the computing resources I used for this project right now... I think @rayryeng should have trained the network based on the code in this repo, maybe he can give you some advice on this problem for now? Thanks @rayryeng ...

Yeah...I have trained the network for 30 epochs followed by train.sh.
But the result is not look really good. Seems that line verification is hard to learn.
Here is the result test in the wireframe dataset:
PPG_gt

PPG

The top row is the groundtruth while below is pred. (junction is predicted by the junction network.)
Like you see, there relationship between each junction is not predicted well..

And it's worth to say, the junction part works good, as it can pred most of junction well.(below is junction pred result: top row is groundtruth and another is pred.)
PPG_real_junc
PPG_pred_junc

Could you(or who have trained this network and get awesome result) give any help or to check where the code in this repo different with your origin code?
Really thanks for your work.

Can you tell me what's your batch size and block inference size in train.sh. And besides, did you modify the learning rate or lambda-heatmap and lambda-adj? Can you provide me your loss_adj curve?

@allankevinrichie Thanks for your reply and help!
Here are my params:

  • batch size = 8
  • block-inference-size = 64
  • and followed below command:
    train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 5.
    train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.
    train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.

Here is my loss_adj training curve, and other curves recorded in training process. Restart at 25 epoch and end at 55 epoch.
image

image

Finally, the text of loss is:
epoch: [29][624/625], lr: 0.002, time_total: 8.34, time_data: 0.04, time_net: 7.70, time_vis: 0.60, loss: 0.7582, loss_heatmap: 0.0870, loss_adj_mtx: 0.0671

If I remember correctly, in order to see some reasonable results, the adj loss should be less than 0.0008 (or 0.008). I suggest that you can try to turn on the loss adj only to see if AMIM can be trained correctly. You can do this by setting --is-train-junc to be False and --lambda-heatmap to be 0. Here is an example.

python main.py \
--exp-name line_weighted_wo_focal_junc --backbone resnet50 \
--backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}' \
--dim-embedding 256 --junction-pooling-threshold 0.2 \
--junc-pooling-size 64 --attention-sigma 1.5 --block-inference-size 128 \
--data-root /data/path --junc-sigma 3 \
--batch-size 16 --gpus 0,1,2,3 --num-workers 10 --resume-epoch latest \
--is-train-junc False --is-train-adj True \
--vis-junc-th 0.1 --vis-line-th 0.1 \
    - train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 0. --lambda-adj 5. \
    - train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 0. --lambda-adj 10. \
    - train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 0. --lambda-adj 10. \
    - end

@allankevinrichie Use this command to train again. It's really hard to make adj_mtx loss decrease. 0.038 is the best result...
图片

hi, sorry to bother you.
I just finished the training process (30 epochs) and got images containing no lines while the junction part works fine, as shown below. There is no connection between junctions.
image
image
Due to the large GPU memory needs and the limitation of my computer, I change --batch-size from 16 to 1. Does this have any effect on the results? Is that possible be the reason for my result?
Thank you for your attention.

I want to know the information of '*.lg', so i write code like this:
import pickle with open("C:\\YMW\\YMWbiye\\PPGNet-master\\SIST (1)\\indoorDist\\train\\00030077.lg", "rb") as f: data = pickle.load(f)
but the code is wrong, "ValueError: Buffer dtype mismatch, expected 'ITYPE_t' but got 'long long'".
What is wrong with this, how to solve it?

@allankevinrichie Dear, I followed your code and train the module. But I can not get results as you presented when running with test.sh.
p2

result

0001

result1

@mingweiY 谢谢你的回复.你将高和宽resize成相同,没有改变loss和后续的处理,其余都是使用默认的。最开始只是在test.sh中添加了图片的路径,python test.py
--exp-name line_weighted_wo_focal_junc --backbone resnet50
--backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}'
--dim-embedding 256 --junction-pooling-threshold 0.2
--junc-pooling-size 64 --block-inference-size 128
--gpus 0, --resume-epoch latest
--vis-junc-th 0.25 --vis-line-th 0.25
- test $1
--path-to-image PATH_TO_IMAGE /mnt/lustre/chenchang1/data/line_test_data/p2.png
不过出现了一个错误:
result = fn(*varargs, **kwargs)
File "test.py", line 103, in test
img = cv2.resize(img, (self.img_size, self.img_size))
cv2.error: OpenCV(4.1.2) /io/opencv/modules/imgproc/src/resize.cpp:3720: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

后面我直接将路径设置到imread中,出现了上面的结果。
你有没有跑出作者提供的效果?如果有的话,麻烦告诉我你的设置。

I want to know the information of '*.lg', so i write code like this:
import pickle with open("C:\\YMW\\YMWbiye\\PPGNet-master\\SIST (1)\\indoorDist\\train\\00030077.lg", "rb") as f: data = pickle.load(f)
but the code is wrong, "ValueError: Buffer dtype mismatch, expected 'ITYPE_t' but got 'long long'".
What is wrong with this, how to solve it?

Hello, I' m facing the same problem here, can you tell me how to create a '.lg' file using my own data?

I want to know the information of '*.lg', so i write code like this:
import pickle with open("C:\\YMW\\YMWbiye\\PPGNet-master\\SIST (1)\\indoorDist\\train\\00030077.lg", "rb") as f: data = pickle.load(f)
but the code is wrong, "ValueError: Buffer dtype mismatch, expected 'ITYPE_t' but got 'long long'".
What is wrong with this, how to solve it?

Hello, I' m facing the same problem here, can you tell me how to create a '.lg' file using my own data?

Having the same question here, also would like to know how to get a '.lg' file using my own data. Thank you.

I want to know the information of '*.lg', so i write code like this:
import pickle with open("C:\\YMW\\YMWbiye\\PPGNet-master\\SIST (1)\\indoorDist\\train\\00030077.lg", "rb") as f: data = pickle.load(f)
but the code is wrong, "ValueError: Buffer dtype mismatch, expected 'ITYPE_t' but got 'long long'".
What is wrong with this, how to solve it?

Hello, I' m facing the same problem here, can you tell me how to create a '.lg' file using my own data?

Having the same question here, also would like to know how to get a '.lg' file using my own data. Thank you.

you can find a demo for how to create '.lg' file in tools/rebuild_yorkurban.py