Voice editing made easy
A speech-based document editing tool intened for those who cannot use keyboards.
Writing and editing papers, documents, and emails is an essential task for any modern day student. Yet, the way in which we do so can be inhibitive for some. While keyboards and mice are incredibly useful for most, for those that are missing limbs, hands, digits, or have conditions such as Arthritis in the hand, Parkinson’s, Carpal Tunnel Syndrome, or Essential Tremor, keyboards are practically unusable if not extremely discomforting. The number of Americans that belong to this group is estimated to be over 28 million.
This problem is amplified by remote and hybrid education. Prior to the pandemic, students had access to disability services, where they could take tests and write papers with the help of university transcribers. However, with the transition to remote learning, these students must now rely on imperfect hacks such as sending audio files, painfully using a keyboard, or avoiding typing altogether.
We wanted to take this opportunity to develop a tool that would make it easy for students with such conditions to participate in the classroom (and potentially employees in the workplace) without access to disability services.
Speechful is a document editing tool that uses your voice as the primary interface between you and your computer. From start to finish, you can create, edit, format, reorder, and export your documents just like you would on MS Word or Google Docs without ever touching a keyboard or mouse.
Speechful is intended to be a desktop application that allows you to write up an essay, an email, or complete a written test by converting your voice into context-aware instructions. By clearly indexing every paragraph and sentence visually, giving voice instructions has never been easier. Once you open up a document, you can simply say a command such as "start typing" or "delete this from paragraph 2" followed by what you would like to type or delete. The supported commands are described below. Once you finish typing, you can tell Speechful to add punctation after a certain word, move your cursor to another paragraph, and most importantly, change words that were misunderstood.
For HackThis, we made a MVP that runs in Chrome to serve as a proof of concept. Here are some screenshots of the MVP:
We also made a business pitch for HackThis, which can be found here: Slides & Transcript
Here is a demonstration of the current product: Youtube
Currently supported voice functionality:
- Create document - "Create new document"
- Save document - "Save"
- Open document - "Open document (document id)"
- Start typing - "Start typing"
- Stop typing - "Stop typing"
- Change Title - "Change title (new title)"
- Add paragraph - "Add paragraph"
- Real-time punctuation - comma, period, quotaiton are mapped to ,." respectively
- Remove paragraph - "Remove paragraph (index)"
- Remove word - "Remove (word) from paragraph (index)"
- Replace word - "Replace (old word) with (new word) in paragraph (index)"
- Bold word - "Bold (word) in paragraph (index)"
- Move cursor - "Move cursor to paragraph (index)"
Planned functionality:
- Change size - "Change size of paragraph (index)"
- Change color - "Change color of paragraph (index)"
- Make above paragraph functions into sentence functions
- Move Speechful to a container such as Electron
- Export a speechful document into common file types
- Create a tutorial for new users
The front-end of this application is built with React. For natural language processing, we are using Google Speech. Design element dependancies include: Material-UI and FontAwesome.
In order to set up the project for contribution, run:
git clone https://github.com/virnarula/speechful.git
to clone this repositorycd speechful
to enter the/speechful
directory.npm install
to install all the dependencies of the projectnpm run start
to launch the development server.- If it doesn't happen automatically, open
localhost:3000
in your browser. - Voila!
This repo uses the Google JavaScript Style Guide.
This project is set up like a traditional react project. This is a high-level overview of the file strucutre. Trivial files and directories will be omitted for simplicity.
speechful
├──src
| ├── components # Contains screens and their components
| ├── data # Where documents are saved
| ├── IO # Contains IO functionality
| ├── model # Document data model representations
| ├── res # Image resources
| ├── speech # Contains speech objects to decipher instructures
| └── App.js # Contains React Routes
└── public
├── index.html # Bare-bones website
└── main.js # Contains start-up code
This project is under the MIT Liscense.