/Caplin

Primary LanguageJava

Program Overview

This program is designed to download a webpage from https://www.gutenberg.org/cache/epub/29364/pg29364.txt, convert it to lowercase, and count the frequency of each letter that starts a word. It will then output the letter that begins the most words on the page.

Design Decisions

There were some decisions that I needed to make regarding how to interpret the requirements:

  • The requirements did not specify how to handle non-letter characters that start words, such as underscores or brackets. To keep it simple, I decided to include them in the count and treat them as letters.
  • Another decision was to treat uppercase and lowercase letters as the same, to simplify the counting process.
  • I considered using a more advanced natural language processing library to tokenize the text and extract the first letter of each word, but ultimately opted for a simpler approach.

Prerequisites

In order to run this program, you'll need to have Java version 8 or higher installed on your computer.

Running the Program

To run the program, follow these steps:

  1. Open a command prompt or terminal window.
  2. Navigate to the directory containing the program file, MostFrequentLetter.java.
  3. Compile the program by entering the command javac MostFrequentLetter.java.
  4. Run the program by entering the command java MostFrequentLetter.
  5. The program will download the webpage and output the letter that starts the most words.

Conclusion

Overall, this program provides a straightforward and efficient method for counting the frequency of letters that start words on a webpage. While there are certainly more advanced techniques that could be used for natural language processing, this simple approach should suffice for most purposes.