Searches a set of organisms (given by NCBI names) for a given DNA sequence and returns the first match it finds, with information on what protein the sequence is found in (if available) and what position it starts at.
- docker
- npm
- python 3.5+
- pip
- bash
Run ./run_local.sh
at the root of the directory. It will run the backend Django server at http://localhost:8000
and the React frontend at http://localhost:3000
.
I got this working in AWS Elastic Beanstalk after a lot of grief. It is almost automated, but I couldn't 100% get it.
Prerequisites:
- The database is already created in RDS.
- It should be publicly accessible.
- Fill in your own host in the
GB_APP_DB_HOST
environment variable - Create a
searchsequences
database on the host - When you create the Beanstalk app, its security group should be given access to the DB.
- The secret variables are saved as secure strings in the parameter store in SSM.
GB_APP_DB_PASSWORD
is the DB passwordGB_APP_KEY
is some string that should be hard to guess, which people use to create a userGB_DJANGO_SECRET_KEY
is a long string used by Django (I don't think this app actually uses it, but it's good practice)
This app runs on AWS's Python 3.6 Beanstalk platform (not the latest, 3.7, which wasn't working well with Django as of this writing).
I zipped up the contents of the server
folder and uploaded it to deploy it.
The first deployment will fail because it won't be able to install psycopg2. SSH onto the EC2 instance and run sudo yum install postgresql-devel
. (I could not find a better solution than this, alas.) Then redeploy. That should succeed.
You will also need to set the GB_DJANGO_ALLOWED_HOST
environment variable to your backend's URL in Beanstalk.
You can then either point the React code to it (using the APP_BASE_URL
in webpage/constants.js
) and either run locally using npm start
, or deploy it into the world. (I hosted statically in S3.) Be sure to tweak the GB_CORS_VALUE
environment variable to match your host.
On the front page, you can choose a user name. The key is the same across users. (With more time, I might have explored Django's prepackaged user management, but this suits the need.) Locally, it's kk
. Remotely, it uses GB_APP_KEY
.
Once you submit a user name with the correct key, you will see any previous sequence queries, their status, and results. To search for a new sequence, enter it in the text area at the bottom and click "New Search".
The new search will immediately appear in the table with a "pending" status. If this is your first query against the given backend, it may take 30ish seconds to finish (since the server will be downloading genbank files to search). Otherwise, it shouldn't take more than a few seconds (longer when searching for a longer sequence).
The status does not update automatically when it completes. You have to click the "Reload" button at the bottom to see the results.
You can also delete any of your queries by clicking the "Delete" button.
Click "Change user" at the top to "log out".
- Frontend React code is in
webpage
. - Backend server code is in
server
. It is Django. - Deployment files are in
.ebextensions
(andserver/aws_startup.py
, which worked better for certain steps)
This is my first time writing React, a Django app (and deploying it to Elastic Beanstalk), and using Biopython. I probably violated quite a few conventions and along the way, but hopefully I'll get better in the future.
I used pipenv for dependency management since it seemed like a better system (more similar to npm) than having to remember to do pip freeze > requirements.txt
before every deploy. But Elastic Beanstalk seems to need requirements.txt, so that's there, too.