/open-source-words

Visualization of the most frequent words used in open source projects

Primary LanguagePythonApache License 2.0Apache-2.0

Intro

Read about this project on Medium: Open Source Words – Part 2

Open Source Words is a project that:

  • Uses Scrapy to collect repository information from GitHub Search results
  • Downloads README files from these repositories
  • Converts README files (md, rst, and html) to plaintext
  • Calculates unique and total word frequencies, filtering stop words and by parts of speech

Ironically, this project contains a README (this document you're reading), though it's unlikely to ever make the Top 2000 projects on GitHub by stars. It never had a change to scrape it's own README.

Results

Total frequencies

Word cloud

Unique frequencies

Word cloud

These word clouds were generated using d3-cloud using the code in wordcloud.js.

The top 10 words by total frequency were:

  • React
  • File
  • C
  • New
  • API
  • License
  • Server
  • Web
  • HTML
  • True

And by unique frequency:

  • New
  • License
  • File
  • Open
  • Want
  • Used
  • Available
  • API
  • First
  • Simple