bug(signsuisse): missing/invalid fields
Opened this issue · 0 comments
J22Melody commented
I ran a more thorough check on the data and found some further missing/invalid fields.
Code:
data = []
for datum in dataset["train"]:
# for datum in itertools.islice(dataset["train"], 0, 10):
current = {
'id': datum['id'].numpy().decode('utf-8'),
'name': datum['name'].numpy().decode('utf-8'),
'spokenLanguage': datum['spokenLanguage'].numpy().decode('utf-8'),
'signedLanguage': sign_language_lookup_table[datum['signedLanguage'].numpy().decode('utf-8')],
'category': datum['category'].numpy().decode('utf-8'),
'definition': datum['definition'].numpy().decode('utf-8'),
'paraphrase': datum['paraphrase'].numpy().decode('utf-8'),
'example': datum['exampleText'].numpy().decode('utf-8'),
'url': datum['url'].numpy().decode('utf-8'),
'video': datum['video'].numpy().decode('utf-8'),
'poseMediapipe': datum['pose']['path'].numpy().decode('utf-8'),
'exampleVideo': datum['exampleVideo'].numpy().decode('utf-8'),
'examplePoseMediapipe': datum['examplePose']['path'].numpy().decode('utf-8'),
}
data.append(current)
df = pd.DataFrame.from_records(data, index='id')
print('Check fields in metafile:')
for item in df.to_dict('records'):
for key, value in item.items():
if key not in ['video', 'poseMediapipe', 'exampleVideo', 'examplePoseMediapipe']:
if not value or value == 'empty':
print(f"id={item['id']}, name={item['name']} has empty {key}")
if key in ['video', 'poseMediapipe']:
if not os.path.exists(value):
print(f"id={item['id']}, name={item['name']} has invalid {key} path (unexpected!)")
if key in ['exampleVideo', 'examplePoseMediapipe']:
if not os.path.exists(value):
if item['example']:
print(f"id={item['id']}, name={item['name']} has invalid {key} path (unexpected!)")
else:
print(f"id={item['id']}, name={item['name']} has invalid {key} path (expected)")
print('---------------------------------------')
and the log file:
signsuisse.log
Looking at the log file:
- there is no
invalid video path
, but there are 1041 unexpectedinvalid poseMediapipe path
s. - there are 16
empty example
s, but there are 1446invalid examplePoseMediapipe path
s (only 16 of them are expected due to no example existing). - For example: LE HAVRE's missing of
examplePoseMediapipe
is expected since it does not have an example, while TRIPLE's missing ofexamplePoseMediapipe
is unexpected.
So a lot of Mediapipe pose estimations are missing. @AmitMY please check whether it is the same on your side (not sure whether it's a downloading problem or not).