run-llama/create_llama_projects

Error when building index from tesla_2021_10k.htm

shenghu opened this issue · 0 comments

I'm trying on Mac and get this error,

ValueError: 3 columns passed, passed data had 5 columns

The error is throw from the following function

def html_to_df(html_str: str) -> pd.DataFrame:
    """Convert HTML to dataframe."""
    from lxml import html

    tree = html.fromstring(html_str)
    table_element = tree.xpath("//table")[0]
    rows = table_element.xpath(".//tr")

    data = []
    for row in rows:
        cols = row.xpath(".//td")
        cols = [c.text.strip() if c.text is not None else "" for c in cols]
        data.append(cols)

    return pd.DataFrame(data[1:], columns=data[0])

Where

  • html_str is "





    ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
    "
  • data[0] is "['', '', '']"
  • data[1] is "['', '☒', '', 'ANNUAL REPORT PURSUA...CT OF 1934', '']"