BITS-Research/Lab

Issues using open data

Closed this issue · 5 comments

Why you are interested in this research project (e.g. Do you have experience with this topic? Does it sound cool and you just want to find out more? etc.)

Every day, we come across some open details that we share with our friends for various reasons. I'm very interested in doing more research on this subject to learn more about the problems with open data. Some of the topics that come to mind immediately are:

  1. It's likely that personal information about an individual can be inferred in this way.
  2. Unless it comes from a reliable source, information may be inaccurate.
  3. Any open data may have a negative impact on a small number of people. And so on.

I'm sure there are more problems with open data, and despite my lack of research experience, I'd like to learn more about this subject. Last time I did research was when I was in a business club called DECA and worked on operations research on a local franchise. Overall, I'm interested in learning how to analyze data and make predictions based on it. This research opportunity will allow me to look at the advantages as well as the drawbacks of open data.

Spend 5 minutes googling some of the keywords. What did you find? What ideas or questions does this spark for you in working on this research project?

Questions that sparked me when researching these keywords were

  1. I've heard about open source, which a lot of people are adopting, and I'd like to learn more about the problems with it.
  2. There is a wealth of information available on demographics, the economy, education, the environment, transportation, and other topics. What issues might arise as a result of using this open data?
  3. Are there any problems with using open data related to a country or a city?

What is your availability for working on this research? (e.g. I can devote 5-10 hours per week for the next two quarters)
I can devote 5-7 hours per week for the next two quarters

Suggest a time to meet with me in the next seven days to discuss, and leave an email where I contact you.
email to contact: anay.r2@gmail.com
Available to meet: 05/26, 05/27, 05/28, 05/31 (anytime should work)

@anaydeshpande - thanks for meeting with me today and your interest in my group's research!
Below are some notes about what we discussed.

I'd like to pick up the work on analyzing political candidate's websites for accessibility compliance. The previous repository for this work is here: https://github.com/BITS-Research/access-2020-localelections

To get started please do the following:

  1. Set up an environment to run an accessibility web-driver on your local machine. You can do this with a notebook that is found in the repo here https://github.com/BITS-Research/access-2020-localelections/blob/main/.ipynb_checkpoints/access-localelections-axe-checkpoint.ipynb ... If you run into any problems getting set up follow directions here: https://github.com/BITS-Research/access-2020-localelections/wiki/Instructions-from-Jackson
  2. This notebook is set up to use data from 2020 candidates (the dataset is here: https://github.com/BITS-Research/access-2020-localelections/blob/main/localcandidate-sample.csv ... Just getting set up and seeing if you can run the previous notebook over the previous data is a good first step
  3. Next, we want to analyze a much smaller dataset (containing only Seattle 2021 candidates) that has a different structure from the previous dataset. The new Seattle 2021 dataset is here: https://github.com/BITS-Research/access-2020-localelections/blob/main/Seattle-2021ElectionWebsiteData%20-%20Sheet1.csv (Note that the variable Website contains links to websites that will be analyzed.)
  4. Modify the notebook so that you can create a report of each of these candidate's websites.
  5. Once you have successfully done this - upload the new notebook and new data (or better yet send a pull request to the repository with these new files)
  6. Then send me an email at nmweber@uw.edu or schedule a time to meet with me and discuss next steps using calendly https://calendly.com/nmweber/30min

If you run into any problems or have any questions feel free to reach out.

Below I have attached the notebook and csv report:
Seattle-2021Elections-reports.zip

@anaydeshpande Nice start - a few things to improve:

  1. In [7] - the csv you are reading in has websites that are not being recognized - this has to do with how the URL for the website is entered in the CSV. Can you edit this field in order to get all of the sites to be evaluated? (its likely just a matter of adding http / https )
  2. The zip only contains a sheet of the website data that was originally uploaded - it looks like your notebook should produce something called seattle_2021_candidate_website_violations.csv

@nniiicc
Below is the csv report:
seattle_2021_candidate_website_violations.csv

Sorry, I mistakenly attached the original sheet. I am working on fixing the issue you stated for In [7] and will get back to you.

@nniiicc

I made the changes you pointed out above and ran the report.
Please see the pull request below:
BITS-Research/campaign-access-eval#2