something about data format
wkywwds opened this issue · 29 comments
The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md
The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md
If I want to replace the data format in "example/vipseg" with my own (as shown below), need I give the masked objects(with red colour)different ids, that is, label them with different colors? And need i do this for each picture in my own dataset?
OK.But The objects with the red mask are the same class of objects, but I hope each of them separated by DEVA, which means I want to mark them in different colors. In this case, do I have to give each object a different ID in the corresponding json file
In this mode, we don't care about the class. It doesn't matter that they are in the same class. If you want to track them independently, label them independently.
On a side note, this type of data seems very out-of-domain.
Fine,it is the CT of ceramic matrix composites(CMCs),which is applied in high-temperature components of aerospace spacecraft.And again, I want to confirm if in the data set required by DEVA, only a part of the pictures(not all) divided by my own model are needed, and each object is numbered in the corresponding json file, that is, different objects are marked with different colors.
Yes.
The json files allow users to propagate segment information (e.g., object classes) to the output. It is not strictly necessary.
Max allocated memory is just reporting the maximum amount of GPU memory allocated by PyTorch.
I think your mask is an RGB image and not an index mask as mentioned above.
If so,how to change the RGB mask into index mask
The masks should be read by PIL as index masks. See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md
Please see this reply from above.
Simply put, the underlying data structure should be a single-channel integer mask. You can verify this by reading the image with PIL and converting it to a numpy array.
The conversion is wrong. You would need to find the unique colors in the image and remap the pixels.
The following is a response from Claude 3 Sonnet.
To find the unique colors in an RGB image and remap the corresponding pixels to an index mask using PIL (Python Imaging Library) and NumPy, you can follow these steps:
- Load the image using PIL and convert it to a NumPy array.
- Reshape the NumPy array to a 2D array, where each row represents a pixel and each column represents the R, G, and B values.
- Use NumPy's
unique
function to find the unique color combinations in the reshaped array.- Create a mapping from each unique color combination to a unique index.
- Use NumPy's
searchsorted
function to find the index of each pixel's color combination in the unique color combinations array.- Create a new NumPy array with the same shape as the original image, but with the indices from the previous step as the values.
Here's the code to implement this:
from PIL import Image import numpy as np # Load the image image = Image.open('image.jpg') # Convert the image to a NumPy array image_array = np.array(image) # Reshape the array to a 2D array (pixels x RGB) reshaped_array = image_array.reshape(-1, 3) # Find unique color combinations unique_colors = np.unique(reshaped_array, axis=0) # Create a mapping from unique colors to indices color_to_index = {tuple(color): index for index, color in enumerate(unique_colors)} # Find the index of each pixel's color combination indices = np.array([color_to_index[tuple(color)] for color in reshaped_array]) # Reshape the indices back to the original image shape index_mask = indices.reshape(image_array.shape[0], image_array.shape[1])In this code:
image_array
is the NumPy array representation of the image.reshaped_array
is a 2D array where each row represents a pixel and each column represents the R, G, and B values.unique_colors
is a 2D array containing the unique color combinations in the image.color_to_index
is a dictionary that maps each unique color combination to a unique index.indices
is a 1D array containing the index of each pixel's color combination in theunique_colors
array.index_mask
is a 2D array with the same shape as the original image, but with the indices fromindices
as the values.After running this code,
index_mask
will contain the index mask, where each pixel value corresponds to the index of its color combination in theunique_colors
array.Note that this approach assumes that the image has a limited number of unique colors. For images with a large number of unique colors (e.g., high-resolution photographs), this method may not be efficient, and you might need to use alternative techniques, such as color quantization or clustering.
Two observations:
- The cropped version is working better
- The input and output colors don't match
It seems to me that your mask input (or conversion) is still buggy.
Thank you for the update. It is possible that having many targets degrades the output (due to increased noise in memory matching), especially in out-of-domain cases like yours.
Is there a solution for this case
Thank you for the update. It is possible that having many targets degrades the output (due to increased noise in memory matching), especially in out-of-domain cases like yours.
- Can you provide all the output frames up to the point of failure so that we can have a closer look?
- What if you supply fewer objects but the image remains uncropped?
Hello, have you received the file I sent? Have you found the reason why DEVA can't efficiently identify my dataset
No, I haven't received anything.
No, I haven't received anything.
Unet is used first, and then the connected domain is filtered according to the area threshold to form a mask.Only seven images have been tested.
IMG.zip
Sorry I cannot check it right now. I'll get back at this later.
Can you check the "IMG.zip" file? If not, I'll send it to you by email
Have you found the cause of the problem when deva tracking multiple targets?
I haven't had the time to test it yet.