numfocus/gsoc

Participation in GSOC 2017

kvnamipara opened this issue ยท 10 comments

Hello!!

I am Kevin Amipara. I am sophomore studying NIT Surat, India. my research interest includes machine learning and artificial intelligence. I am proficient in python , R , octave/matlab , SQL and intermediate in using tensorflow. My projects include making models of machine learning and digit recognition from images using neural networks. I also build stuff using web stacks.

I would love to contribute to Data Retriever on project Improve Data Retriever efficiency for out-of-memory scale datasets . I found this project interesting to me. please guide me further on how to proceed.

Thank you,
Kevin.

@ethanwhite @henrykironde #Data_Retriever

@kvnamipara, Thank you for the interest in contributing to the Data retriever for GSOC 2017,
As part of the application process, we do request student to go through the issues of the Data Retriever and make some contributions.
Feel free to ask for clarification on the issues or to suggest a new issue.

Additionally, we recommend students to read the [student contribution page for Gsoc] clearly (https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md)

Hi @kvnamipara - thanks for your interest in helping us improve our performance on large datasets. In general, this project would involve both fixing know issues that make the code slower than it could be an using profilers to identify other areas of the code that can be improved. For the known issues we have several open issues that lay out some of the tasks that we know would help with both compute and memory efficiency and could serve as components of this project. These include:

@ethanwhite @henrykironde
Thank for suggesting issues. I would love to work on them. Can you give me some source from where i can read about increasing efficiency of out-of-memory scales dataset.

In terms of general reading you'd want to look into profilers for both compute and memory (good starting points for both provided in the links. In addition the core information will be understanding databases working with large data and how to interact with them in Python, in particular parameter binding in Python-SQL libraries, and indexes and bulk inserts for MySQL, PostgreSQL, and SQLite.

Hi @kvnamipara, this is a reminder that the student application period is closing on April 3 16:00 UTC.
If you need help with your application, do not hesitate to let us know. We shall need to review the application so If you work on it early the better. Thanks.

hello @henrykironde Thanks for reminder. I am working on proposal and will give by tonight. thank you.

@henrykironde
Hello,
Should I submit draft through GSOC portal, or i have to submit here as for review. Also their is no mention of Data Retriever under the tag which we have to mention in proposal details in GSOC portal on numfocus organisation's proposal submission page.
screen shot 2017-03-28 at 1 31 22 pm

They are only mentioning following tags:
screen shot 2017-03-28 at 1 31 02 pm

Thanks.

Please submit to the GSoC website. Only there your application will count with google. Start your application with the name of your organization. Then it can still be sorted by us.

@kvnamipara you can submit the draft on the website ans still edit it.
Let me know how it goes.
Thanks @kain88-de

@henrykironde I have submitted it through GSOC portal. Thank you.