This project takes a DOCX file and converts it to HTML.
Everything is self contained in the docker containers and connected using docker-compose. All dependencies are installed by docker and self contained so all you need to do is download the app, build it and run it.
You need to have docker installed on your machine.
https://docs.docker.com/install/
Download the app:
git clone https://github.com/podaac/docx-to-html/
Install the node modules in the dev build
cd frontend && npm install
Note: If you do not yet have node.js (required dependency to execute the "npm" command) installed, it's highly recommended to first install the Node Version Manager (NVM) available here: https://github.com/nvm-sh/nvm#installing-and-updating
To install node.js using NVM:
nvm install node
Then build the Docker container
cd docx-to-html && docker-compose up -d --build
This command builds each docker container and installs all of the dependencies needed. It also runs the app as a daemon using -d, this allows you to exit your terminal session and continue to run the app.
The Frontend is exposed on port 8083. To access the app locally, go to http://localhost:8083/
In production, point to port 8083 or change the exposed port in docker-compose.yml and frontend/prod.docx2html-react-frontend.Dockerfile.
To kill the app:
docker-compose down
To run the app without building again:
docker-compose up -d
Uses prod.docker-compose.yml file and the production dockerfiles for each container. You will need to change the POST request URL to point to the backend if it is not at localhost on your machine. Change POST request URL in frontend/src/components/MainContent.js
docker-compose -f docker-compose.yml -f prod.docker-compose.yml up -d --build
In production, point to port 8083 or change the exposed port in the prod.docker-compose.yml file and frontend/prod.docx2html-react-frontend.Dockerfile.
If you need to change ports here is a list of where those files live:
- Change POST request URL in frontend/src/components/MainContent.js
- Dev Frontend Dockerfile
- Prod Frontend Dockerfile
- Dev Backend Dockerfile
- Backend app.ini
- Dev Nginx Conf
- Prod Nginx Conf
- Prod Backend Dockerfile
- Dev Docker Compose File
- Prod Docker Compose File
- Frontend npm installs
- Backend pip installs
- Frontend Node Docker Image
- Backend Ubuntu Docker Image
- Nginx Docker Image
The user uplodas a file or multiple files on the frontend. React uses Fetch for a POST request to the Flask Backend. The request gets handled in app/api.py. It saves the .docx file to the server in /backend/ and sends the file to converter/handle_input.py.
converter/convertDOCX2HTML.py converts the file to an HTML string, deletes the .docx file and returns the HTML string.
converter/parse_html.py is where all of the parsing and HTML manipulation takes place. It converts the HTML to a Beautiful Soup Object and parses it.
If there are .tiff, .emf or .wmf images, they get converted to PNG in converter/image_converter.py. They get saved as a tmp file while being converted and all the tmp files get deleted once the conversion is over.
converter/handle_input.py returns the parsed HTML to app/api.py to be returned as JSON to the frontend.
The frontend saves the HTML as a file to the users computer.
All containers are set to always be restarted. No data is being saved.
In the dev environment the files are linked to the container using 'volumes' so you do not have to rebuild the container if changes are made. The React frontend hot reloads so you don't need to restart the server. If you change anything in the Flask backend, you need to reboot the server to see any changes.
In the production environment 'volumes' are removed for security purposes.
- Andrew Joseph - Initial work
This project is licensed under Apache 2.0 - see the LICENSE file for details
- JPL Mentors: David Moroni & Suresh Vannan
- Help and Support: Wilbert Veit, Ying Chen, Allan Yu, Sandra Cosic