diff --git a/GSoC-Ideas.md b/GSoC-Ideas.md index 187792a0..c751ae9a 100644 --- a/GSoC-Ideas.md +++ b/GSoC-Ideas.md @@ -1,42 +1,30 @@ # Ideas for Google Summer of Code projects -## Idea: Machine Learning for Anomaly Detection in Open Source Communities - -[ Micro-tasks and place for questions ](https://github.com/chaoss/augur/issues/545) -Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community. - -The volume of activity across all dimensions of open source makes the identification of significant changes both labor intensive and impractical. By connecting Augur's "insight worker" to its "push notification" architecture, and related pages that allow exploration of identified anomalies, open source companies, community managers, and contributors will be in a better position to identify community or technology issues quickly. - -The aims of the project are as follows: -* Understand the core augur engine, database, dashboard, and push notifier. -* Understand the types of anomalies that are both detectable from trace data, and provide useful signals. -* Design an approach that enables user friendly, easy tuning of notification volume, urgency, and utility that is personalized for each user. -* Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur +## Idea: Creating Quality models using GrimoireLab and CHAOSS metrics +[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/287) -_Difficulty:_ Medium -* _Requirements:_ Python programming. Interest in machine learning. Willingness to understand Augur's internals. -* _Recommended:_ Experience with Flask, Scikitlearn, and Pytorch are 'nice to have', but also could be learned in the execution of the project. -* _Mentors:_ Sean Goggins, Matt Germonprez +GrimoireLab is a powerful open source platform that provides support for monitoring and in-depth analysis of software projects. It produces a rich set of dashboards, which can be easily inspected by decision makers to help them understanding the evolution and health of their projects. Despite the large set of dashboards available in GrimoireLab, comparing projects between each others is not straightforward since it requires navigating and drilling down the data in different dashboards. -## Idea: Implementation of GitLab Data Collection Workers -[ Micro-tasks and place for questions ](https://github.com/chaoss/augur/issues/545) +The GrimoireLab module Prosoul is a web application that empowers decision makers with the means to create and manage their own quality models, which are useful means to evaluate and compare software projects. This project idea is about supporting the definition of Quality Models using GrimoireLab data and Prosoul. You will work with +Python, Django and ElasticSearch. -Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community. +The aims of the project are as follows: +* Learning about data analytics in the context of open source communities. +* Understanding the GrimoireLab components (Perceval, ELK, Mordred) and the corresponding tool-chain. +* Understanding the overall approach of Prosoul. +* Designing an approach to shape GrimoireLab data in a format that can be easily consumed by Prosoul. +* Implementing the approach with GrimoireLab data obtained from git, github, mailing lists repositories to obtain simple quality models. +* Improving the Prosoul UI to simplify the management of quality models. -One of Augur's greatest strengths is its highly structured and unified ecosystem data model. This data drives all of the metrics and visualizations that are provided, and is of vital importance to the people maintaining open source projects. Of course, that data has to be gathered somehow, which is where the data collection workers come in. Each worker is responsible for gathering, transforming, and storing data related to a particular project from a particular data source. Building a GitLab data collection worker will enable Augur to collect data about commits, issues, contributors, and PRs from a large number of open source projects that live on GitLab. +The aims will require working with Python, Django and ElasticSearch. -The aims of the project are as follows: -* Understand the core augur engine, database, and data collection process. -* Understand GitLab's internal data model in order to extract the necessary data. -* Understand best practices for collecting data reliably at scale. -* Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur +* _Difficulty:_ Medium +* _Requirements:_ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. +* _Recommended:_ Experience with ElasticSearch and Django would be convenient, but can be learned during the project. +* _Mentors:_ @dlumbrer, @Polaris000, @sduenas, @valeriocos -_Difficulty:_ Medium -* _Requirements:_ Some Python programming experience, an interest in data science, willingness to understand Augur's internals -* _Recommended:_ Experience with Flask, requests, and PostgreSQL are 'nice to have', but also could be learned in the execution of the project -* _Mentors:_ Sean Goggins, Matt Germonprez ## Idea: (Blockchain) : Open Source Health and Sustainability SSO Implementation with Hyperledger/Indy and OAUTH [ Micro-tasks and place for questions ](https://github.com/chaoss/augur/issues/545) @@ -57,7 +45,6 @@ _Difficulty:_ Medium * _Mentors:_ Sean Goggins, Matt Germonprez - ## Idea: Boosting data processing in GrimoieLab [ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/285) @@ -82,52 +69,50 @@ The aims will require working with Python, ELK and the ElasticSearch database. * _Mentors:_ @Polaris000, @sduenas, @valeriocos, @zhquan -## Idea: Packaging and Sharing CHAOSS metrics using GrimoireLab dashboards - -[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/286) +## Idea: Implementation of GitLab Data Collection Workers +[ Micro-tasks and place for questions ](https://github.com/chaoss/augur/issues/545) -GrimoireLab is a powerful toolset for software development analytics. It is able to collect, process and visualize data from a large plethora of tools and platforms used in software development. The obtained data is stored in ElasticSearch and shown via web-based dashboards built on top of Kibana. Predefined dashboards are provided by GrimoireLab, however each user can easily create their own ones to address specific needs, such as the implementation of CHAOSS metrics. +Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community. -In the current stage, GrimoireLab doesn't provide an approach to share custom dashboards, thus limiting the end-user capabilities. This project idea is about implementing such an approach leveraging on Python, the Kibana API, ElasticSearch and OpenDistro for ElasticSearch (ODFE). +One of Augur's greatest strengths is its highly structured and unified ecosystem data model. This data drives all of the metrics and visualizations that are provided, and is of vital importance to the people maintaining open source projects. Of course, that data has to be gathered somehow, which is where the data collection workers come in. Each worker is responsible for gathering, transforming, and storing data related to a particular project from a particular data source. Building a GitLab data collection worker will enable Augur to collect data about commits, issues, contributors, and PRs from a large number of open source projects that live on GitLab. The aims of the project are as follows: -* Learning about building ecosystems around a software and providing functions to encourage growth of user base. -* Understanding the GrimoireLab components (Perceval, ELK, Mordred, Sigils and Kidash) and the corresponding tool-chain. -* Understanding the Kibana API to be able to download and upload visualizations and dashboards. -* Exploring the option of using ODFE instead of/in addition to ElasticSearch. Ideally, the implementation should be compatible with both of them. -* Implementing an approach to simplify the management of visualizations and dashboards. -* Refactoring ELK and Mordred to remove the the logic currently used to manage the dashboards. +* Understand the core augur engine, database, and data collection process. +* Understand GitLab's internal data model in order to extract the necessary data. +* Understand best practices for collecting data reliably at scale. +* Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur -Other aims, such as enhancing Kidash or other components to support the implementation of the approach are completely within scope. +_Difficulty:_ Medium +* _Requirements:_ Some Python programming experience, an interest in data science, willingness to understand Augur's internals +* _Recommended:_ Experience with Flask, requests, and PostgreSQL are 'nice to have', but also could be learned in the execution of the project +* _Mentors:_ Sean Goggins, Matt Germonprez -* _Difficulty:_ Medium -* _Requirements:_ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. -* _Recommended:_ Experience with ElasticSearch and Kibana would be convenient, but can be learned during the project. -* _Mentors:_ @alpgarcia, @sduenas, @valeriocos -## Idea: Creating Quality models using GrimoireLab and CHAOSS metrics +## Idea: Implement the Social Currency Metrics System in GrimoireLabs -[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/287) +[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/288) -GrimoireLab is a powerful open source platform that provides support for monitoring and in-depth analysis of software projects. It produces a rich set of dashboards, which can be easily inspected by decision makers to help them understanding the evolution and health of their projects. Despite the large set of dashboards available in GrimoireLab, comparing projects between each others is not straightforward since it requires navigating and drilling down the data in different dashboards. +The Social Currency Metrics System (SCMS) is a qualitative data collection, processing, and measurement system to improve on data already available in [GrimoireLab](https://chaoss.github.io/grimoirelab/). Implementing the SCMS will help community leaders and other stakeholders leverage qualitative data (e.g., IRC messages or mailing list conversations) for social listening so that they can rely less on simple metrics and more on community sentiment. The SCMS empowers community leaders to make decisions based on what community members freely share about their opinions, wants, and needs. -The GrimoireLab module Prosoul is a web application that empowers decision makers with the means to create and manage their own quality models, which are useful means to evaluate and compare software projects. This project idea is about supporting the definition of Quality Models using GrimoireLab data and Prosoul. You will work with -Python, Django and ElasticSearch. +The SCMS shows why trends occur and identifies missed pitfalls in conclusions taken from quantitative data. With an SCMS platform built natively for the CHAOSS GrimoireLab tool, open source communities can use it to facilitate members’ input in decisions essential to community health. -The aims of the project are as follows: -* Learning about data analytics in the context of open source communities. -* Understanding the GrimoireLab components (Perceval, ELK, Mordred) and the corresponding tool-chain. -* Understanding the overall approach of Prosoul. -* Designing an approach to shape GrimoireLab data in a format that can be easily consumed by Prosoul. -* Implementing the approach with GrimoireLab data obtained from git, github, mailing lists repositories to obtain simple quality models. -* Improving the Prosoul UI to simplify the management of quality models. +The purpose of this project is to: +Build the SCMS in [GrimoireLab](http://chaoss.github.io/grimoirelab), one of CHAOSS project’s systems that collects qualitative data from several channels. The final solution should display that information for tagging and output metrics in a dashboard similar to the screenshots [found here](https://chaoss.community/metric-social-currency-metric-system/). -The aims will require working with Python, Django and ElasticSearch. +The aims of this project are to: +* Create an API on top of GrimoireLab that provides qualiative data collected from a community of the GSoC participants’ choice, via any combination of channels desired (email, social media, survey data, etc). +* Gain familiarity with and create a way to tag and process that qualitative data. +* Develop creative ways to display the dataset in GrimoireLab / Kibana by reading and writing data to Elasticsearch database. +* Investigate ways to process qualitative data at scale using AI or similar technology. Implementation is optional. + +_Difficulty:_ Medium to Hard based on the level of implementation (machine learning). + +_Requirements:_ Python programming. API Development. Some understanding of the social scientific process and qualitative data analysis. Willingness to learn CHAOSS GrimoireLab tools. + +_Recommended:_ Knowledge in several APIs. Interest in science of community management and anthropological studies of online worlds. Interest in machine learning. + + _Mentors:_ Dylan Marcy (SociallyConstructed.Online), @samanthavenialogan (SociallyConstructed.Online), @valeriocos (GrimoireLabs), @GeorgLink (Advising only) -* _Difficulty:_ Medium -* _Requirements:_ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. -* _Recommended:_ Experience with ElasticSearch and Django would be convenient, but can be learned during the project. -* _Mentors:_ @dlumbrer, @Polaris000, @sduenas, @valeriocos ## Idea: Build Workflow Process for CHAOSS Diversity & Inclusion Badging @@ -148,27 +133,47 @@ The aim of this project regards all of these goals, and the work will help the e * _Recommended:_ Experience in the open source community would be a positive, but it is not required. * _Mentors:_ Matt Snell, Matt Germonprez, Saleh Abdel Motaal -## Idea: Implement the Social Currency Metrics System in GrimoireLabs -[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/288) +## Idea: Machine Learning for Anomaly Detection in Open Source Communities -The Social Currency Metrics System (SCMS) is a qualitative data collection, processing, and measurement system to improve on data already available in [GrimoireLab](https://chaoss.github.io/grimoirelab/). Implementing the SCMS will help community leaders and other stakeholders leverage qualitative data (e.g., IRC messages or mailing list conversations) for social listening so that they can rely less on simple metrics and more on community sentiment. The SCMS empowers community leaders to make decisions based on what community members freely share about their opinions, wants, and needs. +[ Micro-tasks and place for questions ](https://github.com/chaoss/augur/issues/545) -The SCMS shows why trends occur and identifies missed pitfalls in conclusions taken from quantitative data. With an SCMS platform built natively for the CHAOSS GrimoireLab tool, open source communities can use it to facilitate members’ input in decisions essential to community health. +Augur is an open source platform that systematically integrates data from several open source repositories, issue trackers, mailing lists, and other communication systems that open source projects rely on to create a highly structured (relational and graph databases), consistent, and validated collection of open source health and sustainability data. Hundreds of highly specialized data requests are implemented in Augur's API, data and visualizations are pushed to Augur users, and the results of one user request benefits the whole community. -The purpose of this project is to: -Build the SCMS in [GrimoireLab](http://chaoss.github.io/grimoirelab), one of CHAOSS project’s systems that collects qualitative data from several channels. The final solution should display that information for tagging and output metrics in a dashboard similar to the screenshots [found here](https://chaoss.community/metric-social-currency-metric-system/). +The volume of activity across all dimensions of open source makes the identification of significant changes both labor intensive and impractical. By connecting Augur's "insight worker" to its "push notification" architecture, and related pages that allow exploration of identified anomalies, open source companies, community managers, and contributors will be in a better position to identify community or technology issues quickly. -The aims of this project are to: -* Create an API on top of GrimoireLab that provides qualiative data collected from a community of the GSoC participants’ choice, via any combination of channels desired (email, social media, survey data, etc). -* Gain familiarity with and create a way to tag and process that qualitative data. -* Develop creative ways to display the dataset in GrimoireLab / Kibana by reading and writing data to Elasticsearch database. -* Investigate ways to process qualitative data at scale using AI or similar technology. Implementation is optional. +The aims of the project are as follows: +* Understand the core augur engine, database, dashboard, and push notifier. +* Understand the types of anomalies that are both detectable from trace data, and provide useful signals. +* Design an approach that enables user friendly, easy tuning of notification volume, urgency, and utility that is personalized for each user. +* Implementing the software with data from the approximately 100,000 open source software repositories currently analyzed using Augur -_Difficulty:_ Medium to Hard based on the level of implementation (machine learning). +_Difficulty:_ Medium +* _Requirements:_ Python programming. Interest in machine learning. Willingness to understand Augur's internals. +* _Recommended:_ Experience with Flask, Scikitlearn, and Pytorch are 'nice to have', but also could be learned in the execution of the project. +* _Mentors:_ Sean Goggins, Matt Germonprez -_Requirements:_ Python programming. API Development. Some understanding of the social scientific process and qualitative data analysis. Willingness to learn CHAOSS GrimoireLab tools. -_Recommended:_ Knowledge in several APIs. Interest in science of community management and anthropological studies of online worlds. Interest in machine learning. +## Idea: Packaging and Sharing CHAOSS metrics using GrimoireLab dashboards + +[ Micro-tasks and place for questions ](https://github.com/chaoss/grimoirelab/issues/286) + +GrimoireLab is a powerful toolset for software development analytics. It is able to collect, process and visualize data from a large plethora of tools and platforms used in software development. The obtained data is stored in ElasticSearch and shown via web-based dashboards built on top of Kibana. Predefined dashboards are provided by GrimoireLab, however each user can easily create their own ones to address specific needs, such as the implementation of CHAOSS metrics. + +In the current stage, GrimoireLab doesn't provide an approach to share custom dashboards, thus limiting the end-user capabilities. This project idea is about implementing such an approach leveraging on Python, the Kibana API, ElasticSearch and OpenDistro for ElasticSearch (ODFE). + +The aims of the project are as follows: +* Learning about building ecosystems around a software and providing functions to encourage growth of user base. +* Understanding the GrimoireLab components (Perceval, ELK, Mordred, Sigils and Kidash) and the corresponding tool-chain. +* Understanding the Kibana API to be able to download and upload visualizations and dashboards. +* Exploring the option of using ODFE instead of/in addition to ElasticSearch. Ideally, the implementation should be compatible with both of them. +* Implementing an approach to simplify the management of visualizations and dashboards. +* Refactoring ELK and Mordred to remove the the logic currently used to manage the dashboards. + +Other aims, such as enhancing Kidash or other components to support the implementation of the approach are completely within scope. + +* _Difficulty:_ Medium +* _Requirements:_ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. +* _Recommended:_ Experience with ElasticSearch and Kibana would be convenient, but can be learned during the project. +* _Mentors:_ @alpgarcia, @sduenas, @valeriocos - _Mentors:_ Dylan Marcy (SociallyConstructed.Online), @samanthavenialogan (SociallyConstructed.Online), @valeriocos (GrimoireLabs), @GeorgLink (Advising only)