/Song-Search

Song-Search algorithm in C++

Primary LanguageC++

README.txt
Name: Harry O'Sullivan
Program: Song Search
Date: 4/27/2014

Purpose of the Program -------------------------------------------------------

Ever hear a song on the radio and wonder what it is? What is the song
title and the artist? You've heard a few of the lyrcs but how do you find the 
song? This program aims to solve this program by creating a search engine for
song lyrics. Given a word, my program will identify the songs that contain that
word. For each matching song, my program will print out the artist, title, and
context of the match. The context consists of a fragment of the lyrics contain-
ing the word. My program also ranks the matching songs and presents only the 
top 10 best matching songs. Matching songs are ranked by the number of times 
the word occurs in the lyrics of a particular song. 

List of Files ----------------------------------------------------------------

main.cpp: main uses data.h to run the program. 

data.h: this is the header file to data.cpp. The member functions of this 
class read in each title, artist, and lyric, and also pushes every 
single word into a hashtable. The search process commences with the 
commence_search(word) function in this file. 

data.cpp: the implementation of data.h.

hashtable.h: this is the header file to hashtable.cpp. The member functions of
this class have many purposes. In a nutshell, they store and keep 
track of how many and from which song each word came from.

hashtable.cpp: the implementation of hashtable.h. 

How to Compile ---------------------------------------------------------------

One compiles by simply typing 'make' in the command line. 

Outline of Data Structure ----------------------------------------------------

Step One: Storing Each Song
A vector of structs was used to store each of the songs in the file.

Step Two: Keeping track of every word and where it came from 
An unordered map of structs. Each bucket in the unordered map (hash
table) consists of a vector of the topten songs, a current count
and a current index. 


Outline of Algorithm ---------------------------------------------------------

Step One: Storing Each Song 
A vector of structs was used to store each of the songs in the given 
file. Each index of this vector represents a song, and within it, it 
has the title, the artist, and a vector of all of the lyrics. This 
vector will be used later on to print the context of a given search
word. 

Step Two: Keeping track of every word and where it came from 
Instead of using a dynamic array and a hashfunction, I opted to use an 
unordered_map. Thus, actions like expanding the array and dealing with 
collisions are already taken care of. Hashing a word into an unordered
map is very simple. One can access an element of the map by simply 
saying map[word]. 

Each bucket of this map consists of a vector of topten songs (based on
frequency), a current_index variable, and a count variable. Once a new
song is being entered, the original current_index and count are com-
piled into a struct and pushed into the vector. Then, current_index and
count are set to the incoming index and 1, respectively. However, this
means that the last occurence of a word in a particular song will not
be pushed into the topten vector (since, to tell if the same word is 
coming from a different song, the incoming index and the current_index
are compared). So, when the search commences, the final current_index
and count is compiled and pushed onto the topten vector if it needs to 
go inside. After each push, the vector is sorted in descending order 
by frequency. 


===================================END========================================