/Voice_Controlled_PDF

Under Development

Primary LanguagePython



Welcome to Voice control based PDFViewer

Voice control based PDFViewer is a speech recognizable GUI written using python3 various modules, but heavily based on tkinter which lets you view PDF and image files and Speech Recognition library for all the speech based commands.

Various functionalities have been embedded in the application to give user an ambient experience. The functionalities are working in the traditional mouse click way, as well as speech based. Here is a small description of some of the tools provided by PDFViewer.




Dark mode:


This feature helps the user to switch to dark mode or switch back to light mode, depending on user's choices. The dark mode is set using visually aesthetic colors to give user a beautiful reading experience. This feature too is mouse as well as speech controlled.

In speech mode-

To activate say "Dark Mode". To exit say "Light Mode".


Speech mode:

By default it is enabled. It can be disabled if the user doesn't want to use it. The following tools listed below can be used in both speech mode as well as mouse mode. While in speech mode to exit say "Exit".


Open Files:

The 'Open Files' function allows you to open one or multiple files into the PDFViewer. You're prompted with a dialog box where you can select all the files you want to open. Once you're done, PDFViewer cycles through each document in the list and let's you view each document.

In speech mode-

If the speech mode is enabled, you just have to say "Open Files". To open directories say "Open Directory"


Previous / Next page:

This feature allows user to shift between pages of the document he is reading. This feature too, is working in speech mode as well as mouse mode.

In speech mode-

If speech mode is enabled, user can say "Next Page"/"Previous Page". The PDF viewer will then open the respective page.


Toolbar :

The PDF Tool Bar gives you functions to manipulate your PDF documents.

  • Use the 'Last Page' and 'First Page' buttons to directly go to the pages on the extreme ends.
  • Use the 'Zoom In', 'Zoom Out' and 'Fit-To-Screen' buttons to make the current page bigger or smaller.
  • Use the 'Rotate' button to rotate PDF pages.

In Speech Mode-

Say "Zoom In"/"Zoom Out"/"Fit to screen to perform the action. Say "Open Help" to access the help section.



Roles and Responsibilities:

Throughout making this project, each member played a key role, right from ideation, to design to developing. Pair programming technique was employed, where each pair worked on the part they had undertaken. Various aspects of this project were carried by all members, working as one under the guidance of the team leader. The roles played by each member is listed out as follows:

Yashdeep - Team lead, motivator, main developer, testing
Shantanu - Co-developer, debugger, testing
Sreehari - UI design, code cleaning, testing
Vedant - Design, testing, documentation



Challenges faced:

Right from ideation to development and deployment, we faced a lot problems. The problems faced can be categorized as:

Technical: In the development and testing phase, we encountered lots of bugs till we perfected the software and make it deployable. The biggest problem was faced during the integration of speech recognition module with UI design. Another technical challenge we faced was fixing the speech recognition functioning. A lot of other small bugs were frequently faced during the UI design part, but were quickly fixable.

Abstract: The biggest abstract challenge we faced was selecting an idea. Amongst a swarm of ideas, we had to choose an idea which was unique, complex and which could be used in the betterment of society. The next abstract challenge was faced in the design aspect of the project, as the UI had to be asethetically pleasing and yet simple.

These were some of the major challenges faced by our team.


Dependencies:

matplotlib==3.1.2

pdfplumber==0.5.23

SpeechRecognition==3.8.1

pyttsx3==2.90

Pillow==7.2.0

PyPDF2==1.26.0


Conclusion:

The main motivation for making this project was to provide a good reading experience for amputated people, who can control an application with just their voice. This is the targeted usecase for our project. The software is also targeted for the general tech savvy demographic, or simply for people lazy enough to use the mouse for controlling a PDF viewer :D