matthewwithanm/python-markdownify

ValueError: invalid literal for int() with base 10: 'undefined' (table colspan)

daviddwlee84 opened this issue · 1 comments

For example

The euro-are economy will expand more quickly than previously thought this year as the bloc’s biggest member exits more than a year of near-stagnation, a 
Bloomberg poll of analysts showed.<figure><img alt="Economists Are More Upbeat on Euro-Area Growth This Year |" src="https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iUePgj7QeYTw/v0/-1x-1.png" style="display: block; margin-left: auto; margin-right: auto;" />  </figure><div></div><p>The euro-are economy will expand more quickly than previously thought this year as the bloc’s biggest member exits more than a year of near-stagnation, a Bloomberg poll of analysts showed.</p><p>Output in the 20-nation currency union will rise by 0.7% in 2024 — more than the 0.5% advance that was forecast in the last monthly survey. Gross domestic product in Germany is now seen increasing by 0.2% compared with 0.1% before.</p><figure>  Economists Are More Upbeat on Euro-Area Growth This 
Year   <noscript><img alt="" src="https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iUePgj7QeYTw/v0/pidjEfPlU1QWZop3vfGKsrX.ke8XuWirGYh1PKgEw44kE/-1x-1.png" style="display: block; margin-left: auto; margin-right: auto;" /></noscript>  <figcaption><div class="source">Source: Bloomberg survey of economists conducted May 3-8</div>  <p>Note: Prior forecast conducted April 5-12</p>  </figcaption>  </figure><p>The results, which also include upgrades to the outlooks in France, Italy and Spain, capture the improving mood in the region. First-quarter GDP readings <a href="https://www.bloomberg.com/news/articles/2024-04-30/europe-gdp-latest-france-grows-in-hope-region-out-of-recession" target="_blank">surprised to the upside</a>, inflation is receding toward 2% and the European Central Bank is gearing up to start lowering interest rates.</p><p>Respondents in the survey predict three quarter-point reductions this year in the deposit rate, which currently stands at 4%. That’s about in line with the view of money-market investors.</p><figure>  Forecasts 2024 GDP Also Raised for Major Euro-Area Economies   <noscript><img alt="" src="https://assets.bwbx.io/images/users/iqjWHBFdfxIU/i1g7fOVEtG60/v0/pidjEfPlU1QWZop3vfGKsrX.ke8XuWirGYh1PKgEw44kE/-1x-1.png" style="display: block; margin-left: auto; margin-right: auto;" /></noscript>  <figcaption><div class="source">Source: Source: Bloomberg 
survey of economists conducted May 3-8\n\n</div>  <p>Note: Prior forecast conducted April 5-12</p>  </figcaption>  </figure><p>ECB President Christine Lagarde said last month that the euro zone economy is “recovering and we are clearly seeing signs of <a href="https://www.bloomberg.com/news/articles/2024-04-17/lagarde-says-euro-zone-economy-clearly-showing-signs-of-recovery" target="_blank">recovery</a>.”</p><table><tbody><tr><th colspan="1">Read More on the Euro-Zone Economy: </th></tr><tr><td colspan="undefined"><p><a href="https://www.bloomberg.com/news/articles/2024-05-06/euro-zone-economy-needs-consumers-to-get-out-and-spend" target="_blank">Euro Zone at Turning Point Needs Consumers to Get Out, Spend </a></p><p><a href="https://www.bloomberg.com/news/articles/2024-04-30/europe-gdp-latest-france-grows-in-hope-region-out-of-recession" target="_blank">Euro Zone Speeds Out of Recession But Inflation Is Sticky</a></p><p><a href="https://www.bloomberg.com/news/articles/2024-05-02/euro-zone-pay-growth-stays-firm-in-first-quarter-citigroup-says" target="_blank">Euro-Zone Pay Growth Stays Firm in First Quarter, Citigroup Says</a></p></td></tr></tbody></table><div></div>

Will cause error ValueError: invalid literal for int() with base 10: 'undefined'.

After some investigation, it might caused by assuming the colspan as an integer while it is <td colspan="undefined">

def convert_td(self, el, text, convert_as_inline):
colspan = 1
if 'colspan' in el.attrs:
colspan = int(el['colspan'])
return ' ' + text.strip().replace("\n", " ") + ' |' * colspan
def convert_th(self, el, text, convert_as_inline):
colspan = 1
if 'colspan' in el.attrs:
colspan = int(el['colspan'])
return ' ' + text.strip().replace("\n", " ") + ' |' * colspan

Quick fix

from markdownify import MarkdownConverter


class ModifiedMarkdownConverter(MarkdownConverter):
    def convert_td(self, el, text, convert_as_inline):
        colspan = 1
        if "colspan" in el.attrs:
            try:
                colspan = int(el["colspan"])
            except ValueError:
                colspan = 1  # Default to 1 if conversion fails
        return " " + text.strip().replace("\n", " ") + " |" * colspan

    def convert_th(self, el, text, convert_as_inline):
        colspan = 1
        if "colspan" in el.attrs:
            try:
                colspan = int(el["colspan"])
            except ValueError:
                colspan = 1  # Default to 1 if conversion fails
        return " " + text.strip().replace("\n", " ") + " |" * colspan


def markdownify(
    html: str, **options
) -> str:
    return ModifiedMarkdownConverter(**options).convert(html)