Scraping: Remove most HTML tags from description
muety opened this issue · 0 comments
muety commented
Create whitelist of HTML tags to keep when crawling event descriptions and discard all the rest.
Keep: <b>
, <i>
, <strong>
, <p>
, <br>
, <ul>
, <ol>
, <li>
, <h{1..6}>
, <table>
, <tr>
, <td>
, <th>
, <tbody>
, <thead>
, <section>
, <a>