A step-by-step guide and scripts to build a MySQL docker image with UMLS, UML-Interface and UMLS-Similarity.
- Install Docker Engine and Docker Compose
- Clone this repo:
git clone https://github.com/cbadenes/umls-docker.git
- Move into the root directory:
cd umls-docker
- Download the UMLS Metathesaurus Full Subset (File Size: compressed- 5GB, uncompressed-32.8GB) and extract the contents into this directory. You will probably have to use Metamorphosys to generate the bbdd dumps. Move folders
META/
andNET/
into this directory. - Download the UMLS-Interface API and extract the contents into this directory. Folder
UMLS-Interface-1.51
is created. - Download the UMLS-Similarity API and extract the contents into this directory. Folder
UMLS-Similarity-1.47
is created.
- With Docker Daemon running execute command:
If you have an ARM architecture (e.g. Mac M1), use the
docker build -t umls-mysql -f Dockerfile .
dockerfile-arm
descriptor:It will take a few minutes to create the image.docker build -t umls-mysql -f Dockerfile-arm .
- Deploy the image via docker compose:
docker compose up -d
- The UMLS database will not be available instantly. Once the container is running the database will load and MySQL will be restarted. To check if database is available you can search for the log trace
mysqld: ready for connections
by:docker compose logs
- Edit the file
META/populate_mysql_db.sh
to replace corresponding lines with following:- MYSQL_HOME=/usr - user=root - password= - db_name=umls - In each lne starting with $MYSQL_HOME/bin/mysql replace -vvv -u $user -p $password with -vvv --local-infile
- Edit the file
NET/populate_netmysql_db.sh
to replace corresponding lines with following:- MYSQL_HOME=/usr - user=root - password= - db_name=umls - In each lne starting with $MYSQL_HOME/bin/mysql replace -vvv -u $user -p $password with -vvv --local-infile
- Once the file are updated you can connect to the container via:
docker exec -it umlsdb /bin/bash
- Edit the file
/etc/mysql/my.cnf
and add the following properties:[mysqld] key_buffer=600M table_cache=300 query_cache_limit=3M query_cache_size=100M read_buffer_size=200M myisam_sort_buffer_size=200M bulk_insert_buffer_size=100M join_buffer_size=100M
- Enables execution permissions for
/umls_init.sh
by means of:chmod +x umls_init.sh
- And finally load the UMLS information via:
This process can take hours.
./umls_init.sh
- Enter the container by means of:
docker exec -it umlsdb /bin/bash
- Calculates the similarity between terms (e.g.
paracetamol
anddiarrhea
):it returnsulms-similarity.pl paracetamol diarrhea
0.0833<>paracetamol(C0000970)<>diarrhea(C0011991)
- Or calculate the similarity between CUI codes (e.g.
C0000970
andC0011991
):it returnsulms-similarity.pl C0000970 C0011991
0.0833<>C0000970(paracetamol)<>C0011991(Diarrhea NOS)