tf2_embedding_vis_tensorboard

Visualizing embeddings of custom data features using tensorboard in tensorflow 2

Learn how to visualize your own image data or features on Tensorboard Embedding Visualizer. The video tutorial for the same is available at: https://youtu.be/ZMDRoscHe5o

Libraries:

Tensorflow -2.3.0
python-3.7.5

tqdm-4.5.6.0
numpy-1.18.5

Runing the embedding visualization using the logs given in this repository

To run the embeddings already provided in embedding-logs. Download all the files.

1. clone/Download this repository

2. the logs and checkpoint are in embedding_logs/

3. Launch Tensorboard

  tensorboard --logdir=embedding_logs --port=6006

if you get the error like - "'tensorboard' is not recognized as an internal or external command,operable program or batch file."

   python -m tensorboard.main --logdir=embedding_logs

4. open localhost:6006 in a browser

5. Then go to the projector options in Tensorboard

(** if it does not load refresh the browser twice)

To regenerate the embedding logs for feature vectors given in this repositoty

To regenerate the same embedding logs you can use the feature_vectors_400_samples.pkl which are the features generated by vgg16 second last layer for teh sample data given in this repository.

If you want to generate embedding visulaization for the given feature vector data, you can directly look into embedding_vis.py script to visualize your feature vectors in embedding visualizer.

The code is described block wise in the next section of # Generating the embedding logs for your own feature vectors Running this script will generate the embedding logs specified to your system path . Then you can launch the tensorboard again.

Important point

# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"

Data used in this Example

I have used 4 categories with 100 samples in each class - Cats, Dogs, Horses, Humans(Horse riders).The data are stored in data.zip folder The Pretrained VGG16 is used to obtain feature vector of size 4096 from the penultimate layer of the network.

Using VGG16 model to obtain feature vectors

If you want to use VGG16 as feature extractor for your own data you can look into save_features.py script.

The script will save your extracted features in feature_vectors.pkl file . The shape of the obtained feature vector will be (num_samples,feature_vector_size).

num_samples = number of images (in this example 400)
feature_vector_size = size of feature vector for each image (in this example its 4096)

Generating the embedding logs for your own feature vectors

If you want to generate embedding visulaization for your own feature vector data that you have- you can directly look into embedding_vis.py script to visualize your feature vectors in embedding visualizer.

Define your log directory to store the logs

LOG_DIR = 'embedding_logs'
if not os.path.exists(LOG_DIR):
    os.makedirs(LOG_DIR)

Prepare meta data file

data = pd.read_csv('data_annotations.csv',usecols=['img_names', 'labels', 'class_names'])

metadata_file = open(os.path.join(LOG_DIR, 'metadata_4_classes.tsv'), 'w')
metadata_file.write('Class\tName\n')

for label,name in zip(data.labels,data.class_names):
    metadata_file.write('{}\t{}\n'.format(label,name))
metadata_file.close()

Load the image data that you want to visualize along with the label names on tensorboard

The shape of image data array should be (num_samples,rows,cols,channel) . In this example it is (400,224,224,3)

   img_data=[]
   for dataset in data_dir_list:
       img_list=os.listdir(data_path+'/'+ dataset)
       print ('Loaded the images of dataset-'+'{}\n'.format(dataset))
       for img in img_list:
           input_img=cv2.imread(data_path + '/'+ dataset + '/'+ img )
           input_img_resize=cv2.resize(input_img,(128,128)) # you can choose what size to resize your data
           img_data.append(input_img_resize)
   img_data = np.array(img_data)

Define the function to generate Sprite images. Sprite image is needed if you want to visualize the images along with the label names for corresponding feature vectors.

   def images_to_sprite(data):
        """Creates the sprite image along with any necessary padding

        Args:
          data: NxHxW[x3] tensor containing the images.

        Returns:
          data: Properly shaped HxWx3 image with any necessary padding.
        """
        if len(data.shape) == 3:
            data = np.tile(data[...,np.newaxis], (1,1,1,3))
        data = data.astype(np.float32)
        min = np.min(data.reshape((data.shape[0], -1)), axis=1)
        data = (data.transpose(1,2,3,0) - min).transpose(3,0,1,2)
        max = np.max(data.reshape((data.shape[0], -1)), axis=1)
        data = (data.transpose(1,2,3,0) / max).transpose(3,0,1,2)
        # Inverting the colors seems to look better for MNIST
        #data = 1 - data

        n = int(np.ceil(np.sqrt(data.shape[0])))
        padding = ((0, n ** 2 - data.shape[0]), (0, 0),
                (0, 0)) + ((0, 0),) * (data.ndim - 3)
        data = np.pad(data, padding, mode='constant',
                constant_values=0)
        # Tile the individual thumbnails into an image.
        data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3)
                + tuple(range(4, data.ndim + 1)))
        data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
        data = (data * 255).astype(np.uint8)
        return data

Generate the sprite image for your dataset

sprite = images_to_sprite(img_data)
cv2.imwrite(os.path.join(LOG_DIR, 'sprite_4_classes.png'), sprite)

For this example it looks like :

load features

with open('feature_vectors_400_samples.pkl', 'rb') as f:
    feature_vectors = pickle.load(f)
#feature_vectors = np.loadtxt('feature_vectors_400_samples.txt')
print ("feature_vectors_shape:",feature_vectors.shape)
print ("num of images:",feature_vectors.shape[0])
print ("size of individual feature vector:",feature_vectors.shape[1])

features = tf.Variable(feature_vectors, name='features')

Create a checkpoint from embedding, the filename and key are name of the tensor.

checkpoint = tf.train.Checkpoint(embedding=features)
checkpoint.save(os.path.join(LOG_DIR, "embedding.ckpt"))

Set up config

config = projector.ProjectorConfig()
embedding = config.embeddings.add()

#The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path =  'metadata_4_classes.tsv'
# Comment out if you don't want sprites
embedding.sprite.image_path =  'sprite_4_classes.png'
embedding.sprite.single_image_dim.extend([img_data.shape[1], img_data.shape[1]])
# Saves a config file that TensorBoard will read during startup.

projector.visualize_embeddings(LOG_DIR, config)

anujshah1003/tf2_embedding_vis_tensorboard