chaoss/augur

Google Summer of Code & Outreachy Project Questions

Closed this issue ยท 43 comments

Google Summer of Code and Outreachy: Augur, 2020

Idea: Machine Learning for Anomaly Detection in Open Source Communities

Micro-tasks and place for questions

Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community.

The volume of activity across all dimensions of open source makes the identification of significant changes both labor intensive and impractical. By connecting Augur's "insight worker" to its "push notification" architecture, and related pages that allow exploration of identified anomalies, open source companies, community managers, and contributors will be in a better position to identify community or technology issues quickly.

The aims of the project are as follows:

  • Understand the core augur engine, database, dashboard, and push notifier.

  • Understand the types of anomalies that are both detectable from trace data, and provide useful signals.

  • Design an approach that enables user friendly, easy tuning of notification volume, urgency, and utility that is personalized for each user.

  • Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur

  • Difficulty: Medium

  • Requirements: Python programming. Interest in machine learning. Willingness to understand Augur's internals.

  • Recommended: Experience with Flask, Scikitlearn, and Pytorch are 'nice to have', but also could be learned in the execution of the project.

  • Mentors: Sean Goggins, Matt Germonprez

Idea: Implementation of GitLab Data Collection Workers

Micro-tasks and place for questions

Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community.

One of Augur's greatest strengths is its highly structured and unified ecosystem data model. This data drives all of the metrics and visualizations that are provided, and is of vital importance to the people maintaining open source projects. Of course, that data has to be gathered somehow, which is where the data collection workers come in. Each worker is responsible for gathering, transforming, and storing data related to a particular project from a particular data source. Building a GitLab data collection worker will enable Augur to collect data about commits, issues, contributors, and PRs from a large number of open source projects that live on GitLab.

The aims of the project are as follows:

  • Understand the core augur engine, database, and data collection process.
  • Understand GitLab's internal data model in order to extract the necessary data.
  • Understand best practices for collecting data reliably at scale.
  • Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur

Difficulty: Medium

  • Requirements: Some Python programming experience, an interest in data science, willingness to understand Augur's internals
  • Recommended: Experience with Flask, requests, and PostgreSQL are 'nice to have', but also could be learned in the execution of the project
  • Mentors: Sean Goggins, Matt Germonprez

Idea: (Blockchain) : Open Source Health and Sustainability SSO Implementation with Hyperledger/Indy and OAUTH

Micro-tasks and place for questions

Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community.

As the size and scope of projects with rich analytical data grows, the need to protect the privacy and anonymity of individuals working in open source software is a rising concern. Implementation of a block chain technology for single sign on (SSO) for different collections of data is one mechanism for enabling comparisons, analysis and typologies for open source projects, making these growing, rich data sets more useful for developers, community managers, open source program officers, industry leaders and other stakeholders. This project promises close collaboration with individuals in open source journalism, open data efforts, and others with an interest in protecting individual privacy rights. Its also a unique and exciting path to work with blockchain technology on a team focused on its use for SSO.

The aims of the project are as follows:

  • Understand the core augur access and entitlements structure.
  • Understand blockchain technology generally.
  • Work with leading blockchain developers at the Linux Foundation
  • Implement a blockchain SSO at scale, working with a team.

Difficulty: Medium

  • Requirements: Some Python programming experience, an interest in blockchain technology, SSO, a willingness to understand Augur's internals, and a willingness to learn about HyperLedger/Indy
  • Recommended: Experience with blockchain technologies, Flask, requests, and PostgreSQL are 'nice to have', but also could be learned in the execution of the project
  • Mentors: Sean Goggins, Matt Germonprez

I would like to contribute to the project "Open Source Health and Sustainability SSO Implementation with Hyperledger/Indy and OAUTH". What is the micro-task for this project?

I would like to contribute to project "Open Source Health and Sustainability SSO Implementation with Hyperledger/Indy and OAUTH" . Please assign me the micro task for this project.

I am interested in contributing to the project "Machine Learning for Anomaly Detection in Open Source Communities". How do I get started with micro-tasks?

I find the project 'Implementation of GitLab Data Collection Workers' interesting enough. I've had some prior experience working with data scraping, so I think this should be something I can quickly get started with. I would really appreciate it if you could point me to a beginner issue for this project. @sgoggins @germonprez

Hello, I am interested in 'Implementation of GitLab Data Collection Workers' . I would really appreciate if anybody can help me in getting started with this or point out to a beginner issue for the project.

I would like to contribute to project "Machine Learning for Anomaly Detection in Open Source Communities". What are the initial tasks for this project?

I would like to contribute to the project "Open Source Health and Sustainability SSO Implementation with Hyperledger/Indy and OAUTH". I have prior experience in creating blockchain using python and deploying using flask. Please assign me the initial task. @sgoggins @germonprez

Hi, I'd like to contribute to the project : "Machine Learning for Anomaly Detection in Open Source Communities". Waiting on the micro-tasks and questions for the project :)

Hi, myself Siddharth Jain I'd like to contribute to the project : "Machine Learning for Anomaly Detection in Open Source Communities". I have prior experience working with machine learning with python.
Waiting on the micro-tasks and questions for the project :)

Hello, I am interested in 'Implementation of GitLab Data Collection Workers'

Hi, I am interested in Implementation of GitLab Data Collection Workers, can you suggest on the micro-tasks and questions for the project.

Hi, I'm Saicharan and I'd like to contribute to the project : "Implementation of GitLab Data Collection Workers". Would appreciate it if you could assign a micro-task to me.
Thanks

@sgoggins do you have the microtasks for these?

Really excited to see so many people interested in these GSoC Projects. For everyone interested in these projects I suggest setting up Augur locally as the first microtask. Go through the documentation to set up Augur locally and get a basic understanding of Augur's architecture, how it works, etc. There might be certain parts in the documentation that are incomplete or missing or need improvement, so feel free to ask questions here and help us improve the documentation (feel free to submit PRs ๐Ÿ˜ƒ )

For people interested in implementing the "GitLab Data Collection Workers", I'd suggest
after setting up a local instance of Augur, try and run a few workers to collect data and try and understand how they are working. Also, take a look at the Augur's unified database schema to get a sense of all the data different workers collect and also take a look at the implementation of different workers currently available to get a sense of how they work and are implemented.

Hi, I'm Akshara, interested in working with machine learning and would like to contribute to the project "Machine Learning for Anomaly Detection in Open Source Communities". Awaiting the micro-tasks and questions for this project.

Hi, I am Pratik Mishra a Machine Learning Enthusiast.I found project named "Machine Learning for Anomaly Detection in Open Source Communities" interesting.Looking forward to contribute in this field.

Microtasks are coming! Had Norovirus and then a trip. Within 18 hours!

Hi, I am Nitin Bhandari, interested in working on the project "Machine Learning for Anomaly Detection". I would like to know more about how to begin and start contributing to this project. Awaiting for further instructions and microtasks.

@sgoggins @parthsharma2 @Nebrethar I'm done with my first microtask and I've created a repo for it. https://github.com/mrsaicharan1/chaoss-microtasks/.

Would appreciate it if you could assign some more tasks related to metrics and data collection :D

Microtask ideas for SSO now posted in link. @KritikGarg1 @PiyushSharma99 @Chinmay4400

Microtask ideas for Machine learning now posted in link of this issue @bnitin92 @pratikmishra356 @aksh555 @siddharthjain1611 @mhash1m @chinmay81098 @ankitkumarsamota121

The microtask for the GitLab worker is now posted in the link of this issue: @mrsaicharan1 @isaeef @KIRA009 @kartik1000 @saphal1998

The microtask for the GitLab worker is now posted in the link of this issue: @mrsaicharan1 @isaeef @KIRA009 @kartik1000 @saphal1998

I've completed my microtasks. Additionally, I've also made a pull request for a new metric addition. I would really appreciate it if you could review it. Thanks!
#556
#560

Hello! I'm Jessica Dong, and I'm interested in the project "Implementation of GitLab Data Collection Workers." Looking forward to contributing and getting to know more about the project!

Hi, I am interested in contributing to the project 'Machine Learning for Anomaly Detection in Open Source Communities'. Looking forward to getting to know more about the project!

Hi, I am an Outreachy applicant interested in contributing to 'Machine Learning for Anomaly Detection in Open Source Communities' project. Kindly assign me a task to get started.

Hello, I am an Outreachy applicant. I am interested in contributing to 'Machine Learning for Anomaly Detection in Open Source Communities' project'.I have prior experience working with machine learning with python. Awaiting for further microtask and instructions @germonprez @sgoggins

Hi , I got into the Outreachy program would like to contribute to the Anomaly detection project, willing to learn some more about machine learning and practice Python skills. Looking forward to something to start with!

Hi! I am a Outreachy applicant. I am interested in working on ' Machine Learning for Anomaly Detection in Open Source Communities. I have experience in working with machine learning algorithms using python and scikitlearn. Looking forward to get started with tasks. @germonprez @sgoggins

Microtask ideas for Machine learning now posted in link of this issue @bnitin92 @pratikmishra356 @aksh555 @siddharthjain1611 @mhash1m @chinmay81098 @ankitkumarsamota121

Hello! I am a ML/DL enthusiast. I am interested in working on the project "Anomoly detection" Can I get started with microtasks?

Hey everyone, I am Rishil.

I am an undergraduate student in electrical engineering with a background in applied machine learning. I have intermediate experience using machine learning and visualization libraries in Python and hope to expand upon my skills through a collaborative rather than a competitive based learning approach.

I am a first-time Outreachy applicant and am both intimidated and very excited to join. I hope to learn and make meaningful contributions to the project on Machine Learning for Anomaly Detection in Open Source Communities

Thank you

Hi everyone,

If you are interested is this project for either GSoC or Outreachy, please get started on the microtasks as mentioned here: #558

Hi, I'm Puneet, and I have relevant knowledge in machine learning and flask as required. I am interested in working on the project "Machine Learning for Anomaly Detection in Open Source Communities". Looking forward to contributing to this project. Thanks.

Hi!
My name is Kadukuntla Poornima and I am a B. Tech Computer Science student at the Indian Institute of Technology, Bhubaneswar. I am an Outreachy 2020 applicant and am looking forward to contribute to the project 'Machine Learning for Anomaly Detection in Open Source Communities'. I have relevant prior experience with Machine Learning, Deep Learning, and Scikitlearn and am eager to learn more and dive in!

Hi, I am Jigyasa, and I am a student at the Indian Institute of Technology, Roorkee. I am an Outreachy 2020 applicant, and I would be interested in contributing to 'Machine Learning for Anomaly Detection in Open Source Communities'. I would love to get started!

Hello,
I am Namrata Valecha and I'm a GSoc Aspirant from Delhi, India. I am currently pursuing B.tech in Computer Science and Engineering from Guru Gobind Singh Indraprastha University.
I have an experience in Python / Django wed development and have completed a couple of internships in the same. I also have a good knowledge of Frontend development, SQL and NoSQL databases, Linux environment, Git, Docker, and Flask backend with RESTful APIs, Third-party authentications and Payment Gateway integrations.
I went through Chaoss's project ideas and found Implementation of GitLab Data Collection Workers project interesting. Looking forward to contributing to this project. Thanks.

Hi, I am Abhishek, currently pursuing B.Tech in Computer Science and Engineering from Indian Institute of Information Technology Trichy. I have experience with backend development using node.js and I am willing to learn development using flask. I am also comfortable with frontend development and Sql and noSQL databases. I have experience with machine learning using scikit learn and deep learning using fastai and pytorch.
I am interested in Machine Learning for Anomaly Detection in Open Source Communities project and look forward to contributing to it.

@puneet29 @KPoornima @Jigyasa-Kumari @namratavalecha @abhhii hello all and welcome to Augur! We are excited you're interested in helping us and we would love to help you make your first contribution to our codebase. Please see the relevant issue links below for detailed information and first steps for our microtasks:

Machine Learning: #558
Single-Sign-On (SSO): #557
GitLab Worker Implementation: #559

We look forward to seeing what you do!!

Hi, I am Aparna Sakshi, currently pursuing Mathematics and Computing at Indian Institute of Technology, Kharagpur. I am really interested to contribute to project Machine Learning for Anomaly Detection.

Hello, I am Sanu Soumya, pursuing Mathematics & Computer Science at Miranda House, DU. I am an Outreachy 2020 applicant and am looking forward to contribute to the project 'Machine Learning for Anomaly Detection in Open Source Communities'. I have relevant prior experience with Machine Learning and Python. Excited to learn and to contribute!

Hello everyone! I have just finished setting up our Slack workspace for applicants, and am happy to invite anyone who is applying for Augur for GSoC 2020. If you would like to be added to the channel, please send me an at c@carterlandis.com with your name and the email address you like me to send the invite to. If you already got an invite from me, you can use it, or if you like I can send it to a different email address. ๐Ÿ˜Š

Hi,I am Snehal and I am an outreachy applicant.I am looking forward to contributing to the project "Machine Learning for Anomaly Detection in Open Source Communities".I've spent some time learning more about Chaoss and Augur.Really excited to start contributing to the project !

GSoC submissions are over and applicants have been selected. Thank you to everyone who submitted!