decrypto-org/spider

Remove subdomain from baseUrls and add as separate table

Closed this issue · 1 comments

The current solution has the following issue: We stored the subdomain as a denormalized column directly inside the baseUrls table. This, however, leads to the issue, that we now have multiple entries per baseUrl.
Will be solved by putting those subdomains in a simple separate table, which than can be joined with the baseUrl Table.

As discussed today, we should shift this to the paths table, since there it does not hurt to store the denormalized data, and we do not need another join to compose the urls to download