Pinned Repositories
coronavirus
DATASET_C
This dataset is a real-world web page collection. It has 18 websites with entity-pages about council members. The official websites of the city council of the 26 Brazilian State capitals were analyzed. We excluded four websites because they did not have an entity-page for each council member, three websites because all the entity-pages were internal frames of a single page, and one website because it did not allow crawling its pages.
doutorado04
Domain-centric data extraction
Fazenda
Fazenda
fazenda2.0
Implementação da Fazenda em Python
ifrs
intrasite
Saída do intrasite
Orion
secondstring
A bunch of fancy soft string matching routines, with some accompanying datasets
ssup
SSUP is a method for entity-page discovery. Given an entity-page of a website, SSUP finds the set of entity-pages in the website. For example, given the page describing Fernando Alonso in the Formula 1 official website, SSUP finds the pages that describe the drivers in the website.
edimarmanica's Repositories
edimarmanica/intrasite
Saída do intrasite
edimarmanica/coronavirus
edimarmanica/DATASET_C
This dataset is a real-world web page collection. It has 18 websites with entity-pages about council members. The official websites of the city council of the 26 Brazilian State capitals were analyzed. We excluded four websites because they did not have an entity-page for each council member, three websites because all the entity-pages were internal frames of a single page, and one website because it did not allow crawling its pages.
edimarmanica/doutorado04
Domain-centric data extraction
edimarmanica/Fazenda
Fazenda
edimarmanica/fazenda2.0
Implementação da Fazenda em Python
edimarmanica/ifrs
edimarmanica/Orion
edimarmanica/secondstring
A bunch of fancy soft string matching routines, with some accompanying datasets
edimarmanica/ssup
SSUP is a method for entity-page discovery. Given an entity-page of a website, SSUP finds the set of entity-pages in the website. For example, given the page describing Fernando Alonso in the Formula 1 official website, SSUP finds the pages that describe the drivers in the website.
edimarmanica/Testando
Testando
edimarmanica/trabalhoBD
edimarmanica/WebExtractionDatasets
Contém as correções realizadas no gabarito e a associação de identificadores para as entidades descritas nas páginas.
edimarmanica/WebExtractionImpl
Implementations related to Web Data Extraction