/basic-web-scraping

Website to get data from other website, clean the code and get the most common words.

Primary LanguageJavaScript

Basic Web Scraping Status Coverage Tests

Website to get data from other website, clean the code and get the most common words.

Project goal by mascam97

This was a project developed in a subject when I was student. As better developer I refactor all the project with better practices and unit testing.

Due to this project is not deployed anymore and does not have more purposes, is no longer maintained.

Achievements 🌟

  • Teamwork in the first release.
  • Got the data with PHP and process it with JavaScript.
  • Better HTML semantic, JavaScript code and practices.
  • Used regular expressions to clean the data instead of fors and ifs.
  • Implemented unit testing with Jest.

Getting Started 🚀

These instructions will get you a copy of the project up and running on your local machine.

Installing 🔧

The programs you need are:

Install the JavaScript dependencies.

npm run install

Finally run the server:

php -S localhost:8080

Testing

Unit JavaScript testing

There are some unit testing to guarantee functionalities about functions and filters, some snapshots are included to save results about many functions and filter. Run the test with:

npm run test

Note: You can run the previous command dynamically with test:watch.

Run the coverage about functions and filters with:

npm run test:coverage

Functionality ⚙️

  • Paste a complete URL in the input and click on get information.
  • The program cleaned the source code (delete html tags, javascript and css code) to get just valuable text (words with 1 or 2 characters, special characters and some words are deleted).
  • Then the program calculates the most common words and tags.

Note: At the moment this program does not work well with website with client side server or/and strange structure.


Authors

  • Martín S. Campos mascam97
  • Some classmates who do not use github anymore :´(.

Contributing

You're free to contribute to this project by submitting issues and/or pull requests.

License

This personal project is licensed under the MIT License.

References 📚