pandas-dev/pandas

BUG: reading CSV from online taking a long time

Closed this issue · 3 comments

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

url = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/terms/terms.csv'
df = pd.read_csv(url)

I have also tried

import pandas as pd
import requests
import io

url = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/terms/terms.csv'
response = requests.get(url)
df = pd.read_csv(io.StringIO(response.content.decode('utf-8')))

In this case, the line response = requests.get(url) is what is taking all the time. requests version 2.28.2

Issue Description

When I run this simple script, the run time can vary dramatically. Sometimes it takes minutes to run, sometimes seconds.

Expected Behavior

This should run in much less than a second.

Installed Versions

1.5.2

phofl commented

Hi, thanks for your report. If requests takes all the time In the second example then this looks like a problem on your side? Internet connection or similar? Your script runs in under a second on my machine

Hi @phofl I don't think this is an internet connection issue, but I have been unable to reproduce this issue on another persons machine, so perhaps something specific to my setup.

Thanks for the report

In this case, the line response = requests.get(url) is what is taking all the time

This should run in much less than a second.

If you think requests.get should take less than a second, please report this to requests

Closing for now then