生成的表格
Opened this issue · 0 comments
cqray1990 commented
["", "<td", " colspan="2"", ">", "", "<td", " colspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "<td", " rowspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "<td", " rowspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]
我生成的表格是没有thead 和tbody 符号,这个符号一定需要?导致:
def get_headbody(html_str):
"""Calculating number of bboxes belonging to "t-head" and "t-body" respectively
Args:
html_str(str): html representing table structure
Returns:
int: number of bboxes belonging to "t-head"
int: number of bboxes belonging to "t-body"
"""
# html_code = ''.join(html_str)
# html_str = list('''<html><body><table>%s</table></body></html>''' % html_code)
s_h, e_h = html_str.index('<thead>'), html_str.index('</thead>')
s_b, e_b = html_str.index('<tbody>'), html_str.index('</tbody>')
num_h = html_str[s_h + 1:e_h].count('</td>')
num_b = html_str[s_b + 1:e_b].count('</td>')
return num_h, num_b
这个函数转换失败