This is a repository that collects TikTok data with TikTok Research API access and augments the data with Generative AI (Whisper and GPT-4)
pip install -r requirements.txt
Note: I did not us a virtual environment so the packages in the requirements.txt file are probably not reflective of all the packages used in this project. If some issues pop up please don't hesitate to email me at: gpinto@usc.edu
- Access to the TikTok Research API
- OpenAI key
augment_metadata/generate_vid_descriptions.py
- Generates the video descriptions with GPT-4 vision
augment_metadata/genreate_whisper_transcript.py
- Uses Whisper to generate the transcripts
augment_metadata/stance_detection.py
- Uses GPT-4 to classify the stance expressed in a TikTok video given the Whisper generated transcript, video description generated by GPT-4, and a frame from the video
collect_data/collect_metadata.py
- calls the TikTok API to collect the metadata
collect_data/download_videos.py
- downloads the TikTok videos,the video's ID and username is required
generate_token/generate_credentials.txt
- contains the command to generate the API token
generate_token/get_tiktok_token.sh
- make the script executable by running 'chmod +x get_tiktok_token.sh'
- execute the script by typing './get_tiktok_token.sh'
supplemental/check_if_public.py
- checks if a video is public via pyktok
supplemental/collect_metadata_by_id.py
- checks if a video is public via TikTok API and retrieves the video's metadata
supplemental/is_mp4_empty.py
- checks if a mp4 file is in an invalid format
Signup for access to the TikTok API here: https://developers.tiktok.com/products/research-api/
- Query Videos: https://developers.tiktok.com/doc/research-api-specs-query-videos/
- Query User Information: https://developers.tiktok.com/doc/research-api-specs-query-user-info/
- Query Video Comments: https://developers.tiktok.com/doc/research-api-specs-query-video-comments/