chezou/tabula-py

Add header param for remote requests

dss010101 opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
Tabula supports fetching remote pdf's by URL. However, often times we need to specify request headers, such as user agent, for such requests. There doesnt seem to be a way to do this using read_pdf

Describe the solution you'd like
provide a header param that can be used for remote requests.

Describe alternatives you've considered
alternative now is to use another means to fetch the pdf remotely, such as Requests libraries that support setting request headers.
Then save to a local store, and finally read using tabula.

Additional context

read_pdf function has user_agent option.
https://tabula-py.readthedocs.io/en/latest/tabula.html#tabula.io.read_pdf

Does it enough for your use case? Otherwise, please let me know what other type of header do you want.

Thanks

That does indeed seem to do the trick...thanks! not sure why i didnt see that in the docs before. perhaps because i was searching for 'header'
tabula.read_pdf(url, pages='all', user_agent='Mozilla/5.0')