samukweku/data-wrangling-blog

Access Excel Tables with Python | Samuel Oranyeli

Opened this issue ยท 18 comments

Access Excel Tables with Python | Samuel Oranyeli

Extract data from a defined table in a spreadsheet

https://samukweku.github.io/data-wrangling-blog/spreadsheet/python/pandas/openpyxl/2020/05/19/Access-Tables-In-Excel.html

hello! thank you for posting this code....was the only article i could find that shows how to bring in a table from an excel sheet with multiple tables!

i tried using your code and got an error:

AttributeError Traceback (most recent call last)
in
6 ws = wb['Sheet1']
7
----> 8 {key : value for key, value in ws.tables.items()}
9
10 for table in ws._tables:

AttributeError: 'Worksheet' object has no attribute 'tables'

what am i doing wrong?(file exists and there is 2 tables on it--)

from openpyxl import Workbook
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo

wb = load_workbook(filename=r'C:\Users\Keith\Downloads\20220123_pythondlparameters.xlsx')
ws = wb['Sheet1']

{key : value for key, value in ws.tables.items()}

Hi, @brianblakeley. Apologies for replying this late. I just saw your message. Have you resolved the issue? I have wrapped this into a function, that should make it easier to use. Let me know if you still have this issue. Again my sincere apologies for the late reply.

This is brilliant! Thank you very much!

Hello @samukweku. Im trying to use your xlsx.table function from janitor to accomplish the above, however Im unable to import the xlsx.table function. I have successfully downloaded / installed pyjanitor and import the package into the jupyter notebook, however I can failures when I try to import xlsx.table from janitor. Do you have any advice?

@DanielPerkins472 thanks for getting in touch.

can you share your script?

import janitor as in
jn.xlsx_table

Below is the script that I had used

!pip install pyjanitor
from janitor import xlsx_table

returns error

ImportError Traceback (most recent call last)
in
----> 1 from janitor import xlsx_table

ImportError: cannot import name 'xlsx_table' from 'janitor' (C:\Users\perkind2\Anaconda3\lib\site-packages\janitor_init_.py)

I did more searching and found another thread where it looks like someone else asked a similar question (link below). Your response there was that the version of pyjanitor was outdated and to uninstall and then reinstall the latest version from github. I looked into the io.py file downloaded and it does in fact look like there is no xlsx_table function.

I tried using the pip common to download / install pyjanitor from your below post but was met with the below error

Script
!pip install git+https://github.com/pyjanitor-devs/pyjanitor.git

Returns error

Collecting git+https://github.com/pyjanitor-devs/pyjanitor.git
Cloning https://github.com/pyjanitor-devs/pyjanitor.git to c:\users\perkind2\appdata\local\temp\pip-req-build-ffliyi_i
Running command git clone -q https://github.com/pyjanitor-devs/pyjanitor.git 'C:\Users\perkind2\AppData\Local\Temp\pip-req-build-ffliyi_i'
ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/pyjanitor-devs/pyjanitor.git 'C:\Users\perkind2\AppData\Local\Temp\pip-req-build-ffliyi_i'
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

https://samukweku.github.io/data-wrangling/python_pandas_access_excel_tables_with_python.html

@DanielPerkins472 Ahhhh ... You are on windows it seems. I think you should install git on your machine then retry the installation. Hopefully soon we will do a release on pyjanitor and you do not have to go through this. Lemme know how it goes

With this method, my excel formulas are being imported instead of the resultant values. Is there a way to fix this?

Hi @YehudaColton can you share a sample of the file, and the code used?

Hi @samukweku, I was able to get your dev version of janitor installed. I am now getting a new error when I try to import the xlsx_table function though. Any thoughts?

from janitor import xlsx_table

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e5fafa9ca390> in <module>
----> 1 from janitor import xlsx_table

~\Github\pyjanitor\janitor\__init__.py in <module>
      5     pass
      6 
----> 7 from .functions import *  # noqa: F403, F401
      8 from .io import *  # noqa: F403, F401
      9 from .math import *  # noqa: F403, F401

~\Github\pyjanitor\janitor\functions\__init__.py in <module>
     17 
     18 
---> 19 from .add_columns import add_columns
     20 from .also import also
     21 from .bin_numeric import bin_numeric

~\Github\pyjanitor\janitor\functions\add_columns.py in <module>
      1 import pandas_flavor as pf
      2 
----> 3 from janitor.utils import check, deprecated_alias
      4 import pandas as pd
      5 from typing import Union, List, Any, Tuple

~\Github\pyjanitor\janitor\utils.py in <module>
    209         "The keyword argument '{argument}' of '{func_name}' is deprecated."
    210     ),
--> 211     error: bool = True,
    212 ) -> Callable:
    213     """

TypeError: 'type' object is not subscriptable

Hi @DanielPerkins472 let me have a look. This is on a windows machine right? If you have access to a Unix machine, could you try and see if it gives the same error?

Hi @samukweku that's correct this is on a windows machine. I don`t have any Unix experience but will see what I can find out.

@DanielPerkins472 I just tested it on my Unix machine and works fine. unfortunately I do not have access to a windows machine to test it out.

Hi @YehudaColton can you share a sample of the file, and the code used?

wb = load_workbook(file_name, data_only=True)

I modified the code slightly by utilizing this 'data_only' tag and it fixed the issue. Thank you for the response however, as well as the wonderful code.

@DanielPerkins472 if you can, create a virtual environment, install only pyjanitor and see if you still get this error message. Let's isolate it and see if it is an OS issue.

I just tested on a windows VM, in a virtual environment and it works fine.

@samukweku I ended up just using the openpyxl method above in this thread and was successful. I had an outdated version of anaconda / python that was causing issues so I had to completely uninstall / reinstall and then the openpyxl method worked for me.

hepbc commented

@samukweku This is a great resource! One query: these solutions require the tables to be "named". Is there some way to pick up contiguous cells as a table from a sheet where they have not been named as a specific table. Hoping I have explained the issue well!

Hi @hepbc , excel tables require names. If they do not have names, then you can just use pd.read_excel, as they are just cells. If it is a bit more complex, have a look at janitor's xlsx_cells. Do you have sample data of your problem, with expected output? I can have a look