/Y-Chrom-database-and-compare

Final project for CS50 Python 2022

Primary LanguagePython

Y Chrom Database & Compare

Description: This is a program designed to extract data (Y Chromosome marker patterns) from tables in .docx files, and save the information as .csv file. Then, the user can input a Y Chromosome pattern and the program checks if it matches any pattern in the .csv file created. In a future version of this project, the .csv file can be used to create a database to manage the data.


Technologies used:

  • Python
  • PySimpleGUI for user interface

Program Workflow

  1. Data is extracted from tables in .docx files and saved as .csv files.
  2. The user can input Y Chromosome pattern for comparison.
  3. The program checks if the input pattern matches any pattern in the .csv file.

Considerations:

I made this program thinking in a real life case we had at my former job. To be able to extract the data from the tables in .docx file, I needed to consider how the data was presented in such tables.

The tables with the information were presented in two formats (with the information contained in two or three rows, this was because were files used for printing and the whole pattern was too wide to be visualized in a single row).

table format

I designed my program to be able to extract the information from that formats. It can also extract the information if the Y chromosome pattern is displayed in more or less rows, or if the markers are in different order. But for the comparison part to work, the table's headers should be (case insensitive): 'MUESTRA', 'DYS576','DYS389I', 'DYS448', 'DYS389II', 'DYS19', 'DYS391', 'DYS481', 'DYS549', 'DYS533', 'DYS438', 'DYS437', 'DYS570', 'DYS635', 'DYS390', 'DYS439', 'DYS392', 'DYS643', 'DYS393', 'DYS458', 'DYS385A/B', 'DYS456' and'Y-GATA-H4'. Once the data is normalized in .csv files, it can be saved in a database in a further step of this project