/Excel-Sheets-To-One-Column-Txt

Turns the contents of spreadsheets into a single columns separated by category (column) with a header category (index)

Primary LanguagePython

Excel-Sheets-To-One-Column-Txt

Admittedly, this is my first github forray. Sorry for any errors :)
Turns the contents of spreadsheets within a folder into a text file where each column is separated into 'one big column' where column names become categories and one column serves as a master category (termed: index).

The purpose for this script was to enable easier qualitative data analysis coding (the process of highlighting qualitative data for later analysis - not programming coding).

It provides some very basic formatting based upon the category (column).

Below is an example table from CSV format but presently the script only supports .xlsx.
Beneath the sample csv data is an example of how each row will be represented (for example, if the next csv row was Germany with all its respective data, it would be formatted in the same way).

example.xlsx with sheetname monacostat

country,region,continent,population,gdp,gdppc
Monaco,Western Europe,Europe,39242,6468000877,164823

will turn into into example_monacostat.txt
The above would look like this:

####################
Monaco
####################

----------region:
Western Europe

----------continent:
Europe

----------population:
39242

----------gdp:
6468000877

----------gdppc:
164823

(example end)

There are two options when running the script:

You can go fast which will rinse every single xlsx file within a folder.
This will open the file and iterate over each sheet whereby whatever is in column A will function as the main header 'index' (in the above example the index was country, Monaco) and will then output all the other columns within the sheet (in above example, region, continent, etc) as subheaders.
It outputs a .txt file named xlsxname_sheetname.txt where xlsxname is the name of the workbook and sheetname is the name of the worksheet.
Each worksheet gets its own textfile.

The second option is to go slow whereby it will iterate over every xlsx in a folder but ask the user to select:

  1. The sheets to keep
  2. The columns within the kept sheets to use and,
  3. The index column (which can be in any column, not just A:A) which ultimately will be the primary category (above example, country)

For some context, one 465kb .xlsx workbook with a single 10,000x row and 6x column sheet takes approximately 10 seconds to process on a mid-spec laptop and will generate a 2.1mb textfile with 189,982 lines.
This means the output filesize will be approx 4.5x greater and the number of rows will increase by approx 18x.