
Pay-As-You-Go Quality Evaluation of the DBpedia Resources

Closed this issue · 13 comments


Extraction of the triples from unstructured sources causes several quality problems. These problems causses wrong results in the information retrieval systems. Nonetheless, the quality has both subjective and objective point of view. While some quality dimensions can be assessed using generic tools (objective), others need crowd-source evaluation of the resource (subjective). This project aims at pay as you go quality evaluation of the DBpedia resources in the information retrieval setting, taking into account both subjective and objective quality dimensions. User will request a query by a quality threshold for the specific dimension (trust, freshness etc.). For the given query from the user, the results will be shown (black box) and feedback from the user will be received. According to the feedback, the quality graph of the resource will be updated w.r.t. the given quality dimension.


The goals of the candidate are as follows:

  • implementing a web interface to allow user to request a query with her quality preferences and get her results.
  • an algorithm that computes the quality of the resource incrementally according to the user feedback.
  • a system which maps the quality results of the resources to the quality graph for each source and updates the quality graph for the given feedback.


The project will provide a system that computes data quality in an pay-as-you manner and provides structured data graphs for the resources.

Warm up tasks


TBD (possible names: Beyza Yaman).


Quality, Feedback, Crowd-source, Information retrieval

I've started reading Quality Assessment Methodologies for Linked Open Data and understanding the code base. I've always wanted to work on a project with real world applications. Would like to proceed with this project. Can I ask my questions and doubts here as I progress?

Yes, please do as long as you have questions!

Hello , I am postgraduate student from India and this is my first time for GSOC . I am qualified web developer and contribute in PHP and Java based projects and I would love to be part of your organization for this GSOC19 as I can see you have listed some warm up task I would like you guys to know I am contributing for the same.

Hi @mrinal1209 and @maykillmore . I suggest that as far as you have some questions contact with me so that you can improve faster. IF you try some of the warm up tasks and share with us (like found errors) it would be good to see it.

@beyzayaman what sort of technology can we use for this project ?

@mrinal1209 It is up to you but triple checkmate is in Java so it may be useful to proceed with that one. What was your idea?

@beyzayaman I was luckily thinking Java too . Well I am on reading the papers provided above.

Do you guys have any background on OWL, RDF technologies? Also Please take into account that you need to write a proposal by the 9th of April so it would be a good idea to keep the reading period as short as possible and try to write some ideas so that we can improve together.

@beyzayaman I have found a small bug on the DBPedia website itself I have attached a POC link for the same :- https://drive.google.com/open?id=1eQJazwOWgdtWQ7RUmmmV4Xx1tO_GdHBX as well can you tell is there any bugs portal like (JIRA) where we can start contributing ?

@beyzayaman I dont have any background in OWL and RDF technologies but I assure that this summer I dont have any commitments and can learn while working as quick as possible

@beyzayaman @maykillmore guys I am using this blog to understand owl and rdf :-


@beyzayaman Hi I am thinking to start making project proposal for this project Things I have done till now are reading paper Quality Assessment Methodologies for Linked Open Data as well gone through the code base Triple Check Mate tool , also did some handson tutorials on RDF with java . Can you suggest something else can I do to make a good GSOC proposal .

Hi! @mommi84 shared a successful project proposal, you can check that:
In the meantime please share your ideas. I guess quality issue must have been clear now and you can share what you have found on data as well. You can create documents and share it with me in the mail given on Mentors page so I can review it.