Scraper for www.createdebate.com
This scraper and the utility scripts here can be used to scrape debates and user information from CreateDebate.
get_all_debate_motion.py - We scrape the list of OPEN DEBATES from the ALL TOPICS Tab at the Website.
Assigns a uuid to each debate and stores in a dictionary 'dict_motion2uuid'
and text file 'data_Motion2uuid.txt'.
get_all_motions_topicwise - We scrape the list of OPEN DEBATES from the different topic tabs one by one.
Assign a uuid to those debate motions which were not present in 'dict_motion2uuid'
Then create text files for each topic to store the motions topic wise.
Also create dictionaries: 'dict_motion2topic' and 'dict_topic2motion'.
get_all_responses - We then create the debate urls for each debate motion and fetch the response of that page
and store it in a serialized binary file in 'Debate_Responses'.
parse_sidestances - We parse the response files to extract the L-R stances and store in 'dict_motion2sidestances'.
parse_arguments - We parse the response files to extract the argument parameters and store in 'Debate_Arguments'.
compose_user_data - We scan through all the files in 'Debate_Arguments' and compile all the arguments userwise.
Also assign a uuid to all the usernames and store in dictionary 'dict_user2uuid'
and text file 'data_User2uuid.txt'
get_all_user_profile - We then create profile urls for each user name and fetch the response of that page and
store it in a serialized binary file in 'UserProfile_Responses'.
parse_user_params - We parse the response files to extract information and details of the user and
store in 'data_UserInformation.txt'.
motion2uuid => {motion : uuid }
motion2topic => {motion : [topics] }
topic2motion => {topic : [motions] }
motion2sidesstances => {motion : [sideL, sideR] }
user2uuid => {user_name : uuid }
data_Motion2uuid.txt => "\t".join(["motion" , "uuid"])
data_User2uuid.txt => "\t".join(["user_name", "uuid"])
data_UserInformation.txt => "\t".join(["UserName", "Name", "Gender", "Age", "MaritalStatus",
"PoliticalParty", "Country", "Religion", "Education",
"Points", "Efficiency", "Arguments", "Debates", "Joined"])
Topicwise_Motions => ["Politics", "Entertainment", "World", "Religion", "Law", "Science",
"Technology", "Sports", "Comedy", "Business", "Travel", "Shopping",
"Health", "NSFW"] and "None" for motions without any assigned topic.
within each file => "\t".join(["motion", "uuid"])
Debate_Respones => ["DebateUUID" for UUID in motion2uuid.values()] binary files.
Debate_Arguments => ["DebateUUID.txt" for UUID in motion2uuid.values()] text files.
within each file => "\t".join(["DebateMotion", "ArgumentID", "PostSide", "ArgumentType",
"UserName", "Time", "ArgumentStance", "Votes", "Post"])
User_Arguments => ["UserUUID.txt" for UUID in user2uuid.values()] text files
within each file => "\t".join(["UserName", "DebateMotion", "ArgumentID", "PostSide",
"ArgumentType", "ArgumentStance", "Votes", "Time", "Post"])
UserProfile_Responses => ["UserUUID" for UUID in user2uuid.values()] binary files
motion - Any String joined using '_' available on the BROWSE DEBATES Website Tab
topic - Any String from the list mentioned in TopicWise_Motions description
sideL, sideR - Any String which is the floated pair of FOR-AGAINST stances by the debate creator
UserName - Any String which has posted an argument in one of the debate responses collected
Name - Any String which is a valid text box input
Gender - ["Male", "Female", "Guy", "Girl", "Dude", "Lady", "Fellow", "grrrl",
"Chap", "Dame", "Transgender"]
Age - Approximate Age of Individual in Years
MaritalStatus - ["Single", "Married", "In a Relationship"]
Political Party - ["Republican", "Democrat", "Libertarian", "Green Party", "Independent", "Other"]
Country - Country Name from a Drop-Down List of all Countries
Religion - ["No Answer", "Agnostic", "Atheist", "Buddhist", "Catholic", "Christian-other",
"Hindu", "Jewish", "Muslim", "Mormon", "Other", "Protestant", "Scientologist",
"Taoist", "Wiccan"]
Education - ["No Answer", "High School", "Some College", "In College", "College Grad",
"Masters", "Post Grad"]
Points - [0 - ) Number of Points earned by the user on the Website
Efficiency - [0 - 100] : Measure of effectiveness of arguments -> % of upvotes a user has
Arguments - [0 - ) : Number of Arguments posted by the user on the Website
Debates - [0 - ) : Number of Debates participated by the user on the Website
Joined - Approximate Date/Time of the user joining the Website
ArgumentID - "arg[0-9]+" assigned to each argument
PostSide - Left or Right, which side the post was posted
ArgumentType - ["Normal", "Supported", "Disputed", "Clarified"]
Time - Approximate Date/Time of Post
ArgumentStance - One of the two {sideL, sideR} for that debate
Votes - Total UpVotes - Total DownVotes
Post - Textual Data of the Post