TeamRoot_CyberSecurityProject

Project Summary:

The customer for this project is Dr. Daniel Ragsdale. He is a professor in the Computer Science and Engineering department and research focus is Cyber Security. In this project we have to design a web-based system that can facilitate multiple things relating to the storage and access of research papers. A lot of research is done at Texas A&M University which leads to a great amount of papers being published in different areas. We are required to design a system that can allow us to search for different papers using a different search criteria. Some of these criteria are:

  • Search for paper similar to how you would search on Google.

  • OCR search on papers that are scanned copies of original work.

  • Search on different keywords (Networking, Cyber Security etc.).

  • Search using Author name

The other requirement is to add papers by using a static pdf link of the paper. The user would provide the details of the paper such as the title, authors, whether the paper is a thesis, journal or conference paper and so on. We will use the pdf link to parse the document and store the keywords from the paper in a text file. This text file will then be used by our search engine.

A user would be able to use our system to search for papers in different research areas. The search result will also display a list of papers that cite the paper that is being searched. This functionality is similar to what Google scholar provides. Our system will also feature the Advanced Search functionality similar to Google’s advanced search. Using this functionality, a user would be able to search for papers based on the different criteria that we have discussed above.

An important part of this system would be an Optical Character Recognition functionality. A lot of papers available online are not in pdf format. They could be scanned copies of old papers, book chapters etc. In such cases it is not possible to do a direct word search on those documents. An OCR program would be able to recognize the different characters from the file. We can use this to then parse the document and store in a text file. This text file will then be used for searching.

The main components of this system are going to be the search engine and the functionality to add/remove papers from the database. The user interface would not be an important part of our system. Our customer is going to use our backend code and combine it with a different system. Hence the user interface is not important for this project.