frictionlessdata/tabulator-py

wrong CSV separators, starting from HTML table that has commas inside cells

aborruso opened this issue · 5 comments

Hi,
if I run tabulator input.html using the below html table, I have

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,250,00
RNDFNC60E16,,,POTENZA,250,00

and not

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,"250,00"
RNDFNC60E16,,,POTENZA,"250,00"

Thank you

<!DOCTYPE html>
<html>
<body>
<table id="results" border="0" class="regpub_dati c35">
		<tbody>
			<tr class="c28">
				<th class="c27">Beneficiario</th>
				<th class="c27">Comune</th>
				<th class="c27">CAP</th>
				<th class="c27">Provincia </th>
				<th class="c27">Importo</th>
			</tr>
			
			<tr>
				<td class="c31">RNDFNC60E16</td>
				<td class="c31">RIPACANDIDA</td>
				<td class="c31">85020</td>
				<td class="c31">POTENZA</td>
				<td class="c34">250,00</td>
			</tr>
			
			<tr>
				<td class="c31">RNDFNC60E16</td>
				<td class="c31"></td>
				<td class="c31"></td>
				<td class="c31">POTENZA</td>
				<td class="c34">250,00</td>
			</tr>
		</tbody>
		</table>
		</body>
</html>

Please preserve this line to notify @roll (lead of this repository)

roll commented

Hi @aborruso,

It's only because it's just printed to the console.

from tabulator import Stream

with Stream('tmp/issue324.html') as stream:
    stream.save('tmp/issue324.csv')

This one will give you a proper:

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,"250,00"
RNDFNC60E16,,,POTENZA,"250,00"

Hi @roll and how to export to CSV using cli?

Thank you

roll commented

It's not supported yet.

Would you like to create a feature request?

Hi @roll I have done.

What's currently the console output format?

Thank you

roll commented

It's kind mixed - it uses bold for headers and just a simple comma-delimited output for rows