hkchengrex/Tracking-Anything-with-DEVA

something about data format

wkywwds opened this issue · 29 comments

If I want to replace the data format in "example/vipseg" with my own (as shown below), need I give the masked objects(with red colour)different ids, that is, label them with different colors? And need i do this for each picture in my own dataset?
00037

The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md

The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md

If I want to replace the data format in "example/vipseg" with my own (as shown below), need I give the masked objects(with red colour)different ids, that is, label them with different colors? And need i do this for each picture in my own dataset? 00037

OK.But The objects with the red mask are the same class of objects, but I hope each of them separated by DEVA, which means I want to mark them in different colors. In this case, do I have to give each object a different ID in the corresponding json file

In this mode, we don't care about the class. It doesn't matter that they are in the same class. If you want to track them independently, label them independently.

On a side note, this type of data seems very out-of-domain.

Fine,it is the CT of ceramic matrix composites(CMCs),which is applied in high-temperature components of aerospace spacecraft.And again, I want to confirm if in the data set required by DEVA, only a part of the pictures(not all) divided by my own model are needed, and each object is numbered in the corresponding json file, that is, different objects are marked with different colors.

Yes.

227cef0b332416143c6c613070a33de
Hi,where to change the "Max allocated memory"

Is the json file in the example necessary?I found that the database even without the json file, the program can still run.
8566a1c18a949e564e00344c510a73e

The json files allow users to propagate segment information (e.g., object classes) to the output. It is not strictly necessary.

Max allocated memory is just reporting the maximum amount of GPU memory allocated by PyTorch.

Why isn't it ideal to run my dataset with DEVA
00000
The mask is this
00000

I think your mask is an RGB image and not an index mask as mentioned above.

If so,how to change the RGB mask into index mask

The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md

Please see this reply from above.
Simply put, the underlying data structure should be a single-channel integer mask. You can verify this by reading the image with PIL and converting it to a numpy array.

Is it to change colour mask into palette mask? But after transforming into palette mask by PIL, it seems that the result is still not satisfactory
ecb5aaddcec05b0c8ae17f753dd9cd6
4d75bc91367d3f79a4c78d9499793e8

00003

The conversion is wrong. You would need to find the unique colors in the image and remap the pixels.

The following is a response from Claude 3 Sonnet.

To find the unique colors in an RGB image and remap the corresponding pixels to an index mask using PIL (Python Imaging Library) and NumPy, you can follow these steps:

  1. Load the image using PIL and convert it to a NumPy array.
  2. Reshape the NumPy array to a 2D array, where each row represents a pixel and each column represents the R, G, and B values.
  3. Use NumPy's unique function to find the unique color combinations in the reshaped array.
  4. Create a mapping from each unique color combination to a unique index.
  5. Use NumPy's searchsorted function to find the index of each pixel's color combination in the unique color combinations array.
  6. Create a new NumPy array with the same shape as the original image, but with the indices from the previous step as the values.

Here's the code to implement this:

from PIL import Image
import numpy as np

# Load the image
image = Image.open('image.jpg')

# Convert the image to a NumPy array
image_array = np.array(image)

# Reshape the array to a 2D array (pixels x RGB)
reshaped_array = image_array.reshape(-1, 3)

# Find unique color combinations
unique_colors = np.unique(reshaped_array, axis=0)

# Create a mapping from unique colors to indices
color_to_index = {tuple(color): index for index, color in enumerate(unique_colors)}

# Find the index of each pixel's color combination
indices = np.array([color_to_index[tuple(color)] for color in reshaped_array])

# Reshape the indices back to the original image shape
index_mask = indices.reshape(image_array.shape[0], image_array.shape[1])

In this code:

  • image_array is the NumPy array representation of the image.
  • reshaped_array is a 2D array where each row represents a pixel and each column represents the R, G, and B values.
  • unique_colors is a 2D array containing the unique color combinations in the image.
  • color_to_index is a dictionary that maps each unique color combination to a unique index.
  • indices is a 1D array containing the index of each pixel's color combination in the unique_colors array.
  • index_mask is a 2D array with the same shape as the original image, but with the indices from indices as the values.

After running this code, index_mask will contain the index mask, where each pixel value corresponds to the index of its color combination in the unique_colors array.

Note that this approach assumes that the image has a limited number of unique colors. For images with a large number of unique colors (e.g., high-resolution photographs), this method may not be efficient, and you might need to use alternative techniques, such as color quantization or clustering.

I wrote the related program based on the above code but the effect is not very good. What goes wrong this time.
e5e79bafab51a4ecab8f74c173f89ac

00000
00000

Two observations:

  1. The cropped version is working better
  2. The input and output colors don't match

It seems to me that your mask input (or conversion) is still buggy.

After trying, it is found that even if the mask is RGB, the program can run normally, and the fewer the number of targets, the better the prediction effect. May I ask whether there will be a discount for the prediction effect of DEVA for a larger number of targets
00000
00000
00000

Thank you for the update. It is possible that having many targets degrades the output (due to increased noise in memory matching), especially in out-of-domain cases like yours.

Is there a solution for this case

Thank you for the update. It is possible that having many targets degrades the output (due to increased noise in memory matching), especially in out-of-domain cases like yours.

  1. Can you provide all the output frames up to the point of failure so that we can have a closer look?
  2. What if you supply fewer objects but the image remains uncropped?

Hello, have you received the file I sent? Have you found the reason why DEVA can't efficiently identify my dataset

No, I haven't received anything.

No, I haven't received anything.
Unet is used first, and then the connected domain is filtered according to the area threshold to form a mask.Only seven images have been tested.
IMG.zip

Sorry I cannot check it right now. I'll get back at this later.

Can you check the "IMG.zip" file? If not, I'll send it to you by email

Have you found the cause of the problem when deva tracking multiple targets?

I haven't had the time to test it yet.