Repository for the development of methods for extracting object category information from infant egocentric videos.
Folder Structure:
- analysis: contains scripts for various analysis pipelines.
- basic_level_manual_labels
- full_goldset
- general_helper_scripts
- goldset_annotations
- mturk_pilot
- panoptic_segmentation_training
- vedi_pilot
- data: contains various data files from points in the processing pipeline(annotated image information, segmented images in COCO-JSON format, .manifest files with annotations).
- annotations: various annotated images from SAYCam set.
- basic_level_manual_labels: BL, GK, and NB went through and labeled the prominent object in each image using this Colab Notebook.
- broad_category_segmentations: Used 10 category dictionary and had people go through and label each image with the categories that were present.
- mturk_detections: pilot bounding box detections with intermediate and final dataframes created using this Colab Notebook
- faces_hands: annotation dataset from previous project; bounding boxes around faces and hands in dataset.
- panoptic_segmentations: panoptic segmentations, jsons created using this Colab Notebook.
- coco_json_format_files: output from reformatting raw segmentations into COCO JSON format.
- pilot_segmentation.json: first pilot, 9 images with segmentations.
- pilot_b_segmentations.json: second pilot, 90 images with segmentations.
- pilot_b_good_segmentations.json: subset of second pilot with confidence thresholded, 60 images with segmentations.
- pilot_big_segmentations.json: final pilot, 801 image subset of 984 images with segmentations.
- combined_segmentations.json: final image set (combines final pilot with another set of final images), 3365 images with segmentations.
- combined_good_segmentations.json: subset of final image set with confidence thresholded, 2215 images with segmentations.
- rest of folders store the above data, but split into 80/20 training and testing sets.
- training and testing data is split using this Colab Notebook and analysis using this Colab Notebook.
- raw_manifest_files: raw Sagemaker output.
- coco_json_format_files: output from reformatting raw segmentations into COCO JSON format.
- category_lists: lists of categories we used to label images.
- categories.txt: basic level category list used as dictionary in annotation tasks.
- object_list.txt: initial full category list used for basic level pilot MTurk and manual annotations.
- image_lists: various lists of video/image filenames and urls.
- SAYCAM_allocentric_videos.csv: 1631 video filenames and whether or not they are allocentric. for filtering out associated images.
- child_hands.csv: list of 3050 public urls to images with child hands from dataset.
- goldset_to_annotate.csv: list of 16996 public urls to images, BL made this.
- hands_sample_annotate.csv: random sample list of 500 public urls to images with hands, subset from hands_to_annotate.csv.
- hands_to_annotate.csv: list of 11828 public urls to images with hands, subset from goldset_to_annotate.csv.
- interesting_image_list.txt: list of 1542 image filenames that NB made by sifting through random subset from goldset_to_annotate.csv. FMI, see notes on choosing interesting images.
- interesting_ims.csv: list of 1000 public urls to images subset from interesting_image_list.txt.
- people_goldset.csv: list of 9616 public urls to images with people in frame, subset from goldset_to_annotate.csv.
- person_sample_annotate.csv: random sample list of 500 public urls to images with people in frame, subset from people_goldset.csv.
- pilotImageURLs.csv: list of 150 public urls to images chosen randomly from interesting_image_list.txt using this helper script.
- top_category_frames.csv: list of 984 public urls to images.
- top_frames.csv: list of 953 public urls to images.
- preprocessed_data: output from processing data using R.
- saycam_images: includes a zip file of "interesting images" from image_lists/interesting_image_list.txt.
- vedi_pilot: TODO
- annotations: various annotated images from SAYCam set.
- experiments: task paradigms.
- mturk_pilot: contains html code for MTurk pilot task collecting bounding box annotations.
- writing: workspace for papers associated with this project.
- cogsci-paper: contains preparations for our 2021 paper in the Proceedings of the Annual Meeting of the Cognitive Science Society and corresponding oral presentation.