vahidk/tfrecord

Warning: The given NumPy array is not writeable

fisheggg opened this issue · 1 comments

Hi,

Thanks for making this tool!
I've got a warning message from pytorch when loading tfrecords using MultiTFRecordDataset:

UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:143.)
  return default_collate([torch.as_tensor(b) for b in batch])

Here's how I write and load the .tfrecord shards:

# writing to .tfrecord shards
out_f = tf.io.TFRecordWriter(output_path)

# feature_sliced is a 3-dim np.array with type np.float32
features = {
    "fbank": tf.train.Feature(bytes_list=tf.train.BytesList(value=[feature_sliced[:, :, slice_idx].tobytes()])),
}
example = tf.train.Example(features = tf.train.Features(feature=features))
out_f.write(example.SerializeToString())
# loading from .tfrecord shards
description = {
    "fbank": "byte",
}

def transform(features):
    features["fbank"] = np.frombuffer(bytes(bytearray(features["fbank"])), dtype=np.float32).T.reshape(-1, 128)
    features["song_title"] = bytes(bytearray(features["song_title"])).decode("utf-8")
    return features

train_set = MultiTFRecordDataset(
    tfrecord_pattern, 
    index_pattern, 
    splits_train, 
    description=description,
    transform=transform,
    infinite=False
)

package versions I'm using:

torch==1.8.1+cu101
numpy==1.21.0
tfrecord==1.14.1

Thanks for looking into this!

Best,
Arthur

I know it's kind of a late response but, this is a problem with np.frombuffer implementation, which returns a READ-ONLY numpy array (probably a const allocated array on C-side) so you need to make a copy of this buffer in order to allow torch to modify it and wrap the Tensor around it. just add .copy() by the end of the np.frombuffer() call.