Improve Data Retriever efficiency for out-of-memory scale datasets

Question

Improve Data Retriever efficiency for out-of-memory scale datasets

Harshish opened this issue 8 years ago · 2 comments

Greetings everyone,
I am Harsis Yadav, a pre-final year student studying at Manipal Institute of Technology, Manipal. I wish to work on the project mentioned in the title. I have knowledge SQL as it was in my curriculum. Recently I have done a project in data mining (my program elective) using python implementing the Affinity Propagation algorithm. My knowledge in python includes matplotlib,numpy,pandas,mechanize and BeautifulSoup.

I would like to participate in GSOC 2017 under your project please guide me through it.

Thank You,
Harsis Yadav

@ethanwhite @henrykironde #DataRetriever

Answer 1 · 2017-03-10T15:07:37.000Z

Hi @Harshish - thanks for your interest. This project would involve fixing known issues that make the code slower/more memory intensive than it could be and using profilers to identify other areas of the code that can be improved. We have several open issues that lay out some of the tasks that we know would be helpful:

weecology/retriever#715
weecology/retriever#168
weecology/retriever#95
weecology/retriever#440

The first step though is to have a look at the issues page of the Data Retriever and make one or more contributions. This is a required step and the results will be part of your application. We have a number of issues tagged as getting-started that are a good place to start engaging with the codebase.

Answer 2 · 2017-03-27T13:22:55.000Z

Hi @Harshish, This is a reminder that the student application period is closing on April 3 16:00 UTC.
If you need help with your application, do not hesitate to let us know. We shall need to review the application so If you work on it early the better. Thanks.