Error when building index from tesla_2021_10k.htm
shenghu opened this issue · 0 comments
shenghu commented
I'm trying on Mac and get this error,
ValueError: 3 columns passed, passed data had 5 columns
The error is throw from the following function
def html_to_df(html_str: str) -> pd.DataFrame:
"""Convert HTML to dataframe."""
from lxml import html
tree = html.fromstring(html_str)
table_element = tree.xpath("//table")[0]
rows = table_element.xpath(".//tr")
data = []
for row in rows:
cols = row.xpath(".//td")
cols = [c.text.strip() if c.text is not None else "" for c in cols]
data.append(cols)
return pd.DataFrame(data[1:], columns=data[0])
Where
- html_str is "
"☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 - data[0] is "['', '', '']"
- data[1] is "['', '☒', '', 'ANNUAL REPORT PURSUA...CT OF 1934', '']"