muety/kitsquid

Scraping: Remove most HTML tags from description

muety opened this issue · 0 comments

muety commented

Create whitelist of HTML tags to keep when crawling event descriptions and discard all the rest.
Keep: <b>, <i>, <strong>, <p>, <br>, <ul>, <ol>, <li>, <h{1..6}>, <table>, <tr>, <td>, <th>, <tbody>, <thead>, <section>, <a>