chaoss/grimoirelab

No Data Available For GitHub Comments Dashboard

maltif opened this issue · 17 comments

I've been following this documentation to have separate dashboard GitHub Comments For PRs and Issues on our existing working GrimoireLab tool.
I've done the required changes in setup.cfg and projects.json as per grimoirelab-sirmordred github2 doc

  • This is the setup.cfg configuration which is being used by grimoirelab.
[general]
short_name = TivoInc
update = true
min_update_delay = 60
debug = false
logs_dir = /home/bitergia/logs
aliases_file = /home/bitergia/conf/aliases.json

[projects]
projects_file = /home/bitergia/conf/projects.json

[es_collection]
url = http://elasticsearch:9200

[es_enrichment]
url = http://elasticsearch:9200
autorefresh = true

[sortinghat]
host = mariadb
user = root
password =
database = demo_sh
load_orgs = true
orgs_file = /home/bitergia/conf/organizations.json
autoprofile = [github, pipermail, git]
matching = [email]
sleep_for = 100
unaffiliated_group = Unknown
affiliate = true
strict_mapping = false
reset_on_load = false
identities_file = [/home/bitergia/conf/identities.yml]
identities_format = grimoirelab

[panels]
kibiter_time_from = now-1y
kibiter_default_index = git
kibiter_url = http://kibiter:5601
kibiter_version = 6.1.4-1
#gitlab-issues = true
#gitlab-merges = true
github-comments = true

[phases]
collection = true
identities = true
enrichment = true
panels = true



[git]
raw_index = git_raw
enriched_index = git_enriched
latest-items = true
studies = [enrich_demography:git, enrich_git_branches:git, enrich_areas_of_code:git, enrich_onion:git]

[github]
raw_index = github_raw
enriched_index = github_enriched
api-token = ghp_XXX
category = issue
sleep-for-rate = true
no-archive = true
studies = [enrich_onion:github, enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github, enrich_backlog_analysis, enrich_demography:github]

[github:pull]
raw_index = github_pull_raw
enriched_index = github_pull_enriched
api-token = ghp_XXX
category = pull_request
sleep-for-rate = true
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github, enrich_demography:github]


[github2:issue]
api-token = ghp_XXX
raw_index = github2-issues_raw
enriched_index = github2-issues_enriched
sleep-for-rate = true
category = issue
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github2, enrich_feelings]

[github2:pull]
api-token = ghp_XXX
raw_index = github2-pull_raw
enriched_index = github2-pull_enriched
sleep-for-rate = true
category = pull_request
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:git, enrich_feelings]
## studies based on enriched indexes

[enrich_demography:git]

[enrich_areas_of_code:git]
in_index = git_raw
out_index = git_aoc_enriched

[enrich_onion:git]
in_index = git_raw
out_index = git_onion_enriched
contribs_field = hash

[enrich_git_branches:git]
run_month_days = [1, 23]

[enrich_extra_data:git]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

[enrich_forecast_activity]
out_index = git_study_forecast


[enrich_onion:github]
in_index_iss = github_issues_onion_src
in_index_prs = github_prs_onion_src
out_index_iss = github_issues_onion_enriched
out_index_prs = github_prs_onion_enriched

[enrich_geolocation:user]
location_field = user_location
geolocation_field = user_geolocation

[enrich_geolocation:assignee]
location_field = assignee_location
geolocation_field = assignee_geolocation

[enrich_extra_data:github]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

#Added as part of github2
[enrich_extra_data:github2]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

[enrich_feelings]
attributes = [title, body]
nlp_rest_url = http://localhost:2901

#End Here

[enrich_backlog_analysis]
out_index = github_enrich_backlog
interval_days = 7
reduced_labels = [bug,enhancement]
map_label = [others, bugs, enhancements]

[enrich_demography:github]

[enrich_duration_analysis:kanban]
start_event_type = MovedColumnsInProjectEvent
fltr_attr = board_name
target_attr = board_column
fltr_event_types = [MovedColumnsInProjectEvent, AddedToProjectEvent]

[enrich_duration_analysis:label]
start_event_type = UnlabeledEvent
target_attr = label
fltr_attr = label
fltr_event_types = [LabeledEvent]

[enrich_reference_analysis]
  • After required changes, I'd restarted the grimoirelab and I could see the index are available on kibana dashboard.

image

  • Downloaded the below JSON files
wget https://raw.githubusercontent.com/chaoss/grimoirelab-sigils/master/panels/json/github2_pull_requests-index-pattern.json
wget https://raw.githubusercontent.com/chaoss/grimoirelab-sigils/master/panels/json/github2_pull_requests_comments_and_collaboration.json
  • Then imported the following JSON files using kidash tool
kidash -g -e http://localhost:9200/ --import github2_pull_requests-index-pattern.json
kidash -g -e http://localhost:9200/ --import github2_pull_requests_comments_and_collaboration.json
  • At last, I'd restarted the grimoirelab, However Data is not available on Kibana Dashboard as shown in the attached image.

image

  • Attaching image for current Data Status

image

I'm not sure what I missed here. Could you please help to debug this?

Hi @maltif

On your indexes screenshot I only see raw indexes (github2-issues_raw and github2-pull_raw), let Mordred creates the enriched one. When the enriched indexes are created check if the aliases are correct (GET _cat/aliases).

You also need to import github2_issues-index-pattern.json and github2_issues_comments_and_collaboration.json

Thank you @zhquan for the reply. Sure, I'll wait for it.

Meanwhile could you please verify aliases.json file?

This is aliases.json which is configured in docker-compose.yml:

{
  "askbot": {
    "raw": [
      {
        "alias": "askbot-raw"
      }
    ],
    "enrich": [
      {
        "alias": "askbot"
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  },
  "gerrit": {
    "raw": [
      {
        "alias": "gerrit-raw"
      }
    ],
    "enrich": [
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  },
  "git": {
    "raw": [
      {
        "alias": "git-raw"
      }
    ],
    "enrich": [
      {
        "alias": "git"
      },
      {
        "alias": "git_author"
      },
      {
        "alias": "git_enrich"
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  },
  "github:repo": {
    "raw": [
      {
        "alias": "github_repositories-raw"
      }
    ],
    "enrich": [
      {
        "alias": "github_repositories"
      }
    ]
  },
  "github2:issue": {
    "raw": [
      {
        "alias": "github2_issues-raw"
      }
    ],
    "enrich": [
      {
        "alias": "github2_issues"
      },
      {
        "alias": "github2_pull_requests",
        "filter": {
          "terms": {
            "issue_pull_request" : [
              "true"
            ]
          }
        }
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  },
  "github2:pull": {
    "raw": [
      {
        "alias": "github2_pull_requests-raw"
      }
    ],
    "enrich": [
      {
        "alias": "github2_pull_requests"
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  },
  "github": {
    "raw": [
      {
        "alias": "github-raw"
      }
    ],
    "enrich": [
      {
        "alias": "github_issues"
      },
      {
        "alias": "github_issues_enrich"
      },
      {
        "alias": "issues_closed"
      },
      {
        "alias": "issues_created"
      },
      {
        "alias": "issues_updated"
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      },
      {
        "alias": "all_enriched_tickets",
        "filter" : {
              "terms" : {
                "pull_request" : ["false"]
              }
            }
      },
      {
        "alias": "github_issues_onion-src",
        "filter" : {
            "terms" : {
            "pull_request" : [
                "false"
                ]
            }
        }
      },
      {
        "alias": "github_prs_onion-src",
        "filter" : {
            "terms" : {
            "pull_request" : [
                "true"
            ]
          }
        }
      }
    ]
  },
  "pipermail": {
    "raw": [
      {
        "alias": "pipermail-raw"
      }
    ],
    "enrich": [
      {
        "alias": "mbox"
      },
      {
        "alias": "kafka"
      },
      {
        "alias": "affiliations"
      },
      {
        "alias": "all_enriched"
      }
    ]
  }

}

Use this aliases.json instead. There is no studies_aliases section on your aliases.json

I'd configured aliases.json which you've shared and imported github2_issues-index-pattern.json & github2_issues_comments_and_collaboration.json as well.

I'll wait for enrichment process to be completed and confirm.

Thanks again for your kinds support 🙂

@zhquan

I'm not sure if there is a problem or not, but it's been over 8 hours and the GitHub PRs and Issues Comments data is still not appearing on the Kibana dashboard. I have checked the Mordred all.logs, but I couldn't find any errors or issues.

  • I also haven't seen any initialization of the enrich task. Please refer to the screenshot for reference.

image (6)

  • This is the current aliases status:

image (7)

  • This is the current 'indices' status:

image (8)
.png…]()

  • Though, I noticed one thing in the setup.cfg for the Github2 section. As you can see, I am using a hyphen(-) instead of an underscore(_). I'm unsure if this makes a difference, so I thought I would reach out to you for clarification. Please refer to the screenshot for reference.

image (9)

Let's try to run only the enrichment phase because I don't see any log related to the enrichment or study phases

[phases]
collection = false
identities = false
enrichment = true
panels = false

Take into account that if you are using the same GitHub token in github and github2 sections the collection phase it will take very long due to the token limit, you can add more tokens (different accounts).

[github]
api-token = [ghp_XXX, ghp_YYY, ghp_ZZZ]

Though, I noticed one thing in the setup.cfg for the Github2 section. As you can see, I am using a hyphen(-) instead of an underscore(_). I'm unsure if this makes a difference, so I thought I would reach out to you for clarification. Please refer to the screenshot for reference.

You can use the hyphen without any problem

@zhquan thank you so much for responding.

I can now view data in the GitHub Comments Issues Dashboard after enabling enrichment in the phases section as you advised. The GitHub Issues Comments and Collaboration dashboard appears to be working well.

image (10)

image (11)

image (12)

However, the Top 10 Repository visualization in the GitHub Pull Requests Comments and Collaboration Dashboard displays "Pull Requests Count 0" even though the repos have multiple PRs. Is this because the collection process(github2:pull) has not finished for all the repos(Total 1679 Repos)?

image

[root@grimoire tmp]# grep '\[github2:issue\]' all.log | grep collection | awk '{print $NF}' | sort | uniq | grep http | wc -l
1372
[root@grimoire tmp]# grep '\[github2:pull\]' all.log | grep collection | awk '{print $NF}' | sort | uniq | grep http | wc -l
235
[root@grimoire tmp]# grep '\[github\]' all.log | grep collection | awk '{print $NF}' | sort | uniq | grep http | wc -l
1679
[root@grimoire tmp]# grep '\[github:pull\]' all.log | grep collection | awk '{print $NF}' | sort | uniq | grep http | wc -l
1679
[root@grimoire tmp]# grep '\[git\]' all.log | grep collection | awk '{print $NF}' | sort | uniq | grep http | wc -l
1679

However, the Top 10 Repository visualization in the GitHub Pull Requests Comments and Collaboration Dashboard displays "Pull Requests Count 0" even though the repos have multiple PRs. Is this because the collection process(github2:pull) has not finished for all the repos(Total 1679 Repos)?

Probably

Some checks:

  • The visualization is using the correct field (is_github_pull_request)
  • Go to Discover and check the github2_pull_requests index pattern

I'd checked following:

  • visualization is using the correct field: is_github_pull_request

image

  • to check github2_pull_requests index pattern

image

The GitHub Pull Requests Comments and Collaboration Dashboard is still showing a "Pull Requests Count" of 0. It appears that the collection process for github2:pull has not yet finished for all repositories. Can the configuration be enabled specifically for github2:pull in the setup.cfg file? I'd like to see github2:pull data.

Almost all items (99.6%) come from the github2-issues_enriched index, it is normal for the Pull Requests Count to be 0.

If you want to run only github2:pull remove the rest of the backends and also remove the studies.

[github2:pull]
api-token = ghp_XXX
raw_index = github2-pull_raw
enriched_index = github2-pull_enriched
sleep-for-rate = true
category = pull_request
no-archive = true

Of course, I'll give it a go tomorrow.

I noticed that Grimoire is set to retrieve GitHub data for the past 10 years by default. Can I change the duration to only retrieve data for the past 1 year?

I'd like to gather GitHub metrics data for the past 1 year.

Thank you so much for your assistance and support, I truly appreciate your time.

I noticed that Grimoire is set to retrieve GitHub data for the past 10 years by default. Can I change the duration to only retrieve data for the past 1 year?

Of course, you can from-date = 2022-01-01

[github2:pull]
api-token = ghp_XXX
raw_index = github2-pull_raw
enriched_index = github2-pull_enriched
sleep-for-rate = true
category = pull_request
no-archive = true
from-date = 2022-01-01

Thank you so much for your assistance and support, I truly appreciate your time.

It's a pleasure :)

@zhquan Firstly, thank you for sharing the from-date parameter. I've configured it under both the github2:pull and github2:issue backend.

Secondly, I apologize for the repeated inquiries regarding the issue I'm experiencing. It has been a couple of days, and the Top Repository Visualization in the **GitHub Pull Requests Comments and Collaboration Dashboard ** continues to mostly display "Pull Requests Count 0." However, I can see Reviews and Comments data.

I'm attaching my current setup.cfg file configuration again, just to double-check in case I missed anything.

[general]
short_name = TivoInc
update = true
min_update_delay = 60
debug = false
logs_dir = /home/bitergia/logs
aliases_file = /home/bitergia/conf/aliases.json

[projects]
projects_file = /home/bitergia/conf/projects.json

[es_collection]
url = http://elasticsearch:9200

[es_enrichment]
url = http://elasticsearch:9200
autorefresh = true

[sortinghat]
host = mariadb
user = root
password =
database = demo_sh
load_orgs = true
orgs_file = /home/bitergia/conf/organizations.json
autoprofile = [github, pipermail, git]
matching = [email]
sleep_for = 100
unaffiliated_group = Unknown
affiliate = true
strict_mapping = false
reset_on_load = false
identities_file = [/home/bitergia/conf/identities.yml]
identities_format = grimoirelab

[panels]
kibiter_time_from = now-1y
kibiter_default_index = git
kibiter_url = http://kibiter:5601
kibiter_version = 6.1.4-1
#gitlab-issues = true
#gitlab-merges = true
github-comments = true

[phases]
collection = true
identities = true
enrichment = true
panels = true


[git]
raw_index = git_raw
enriched_index = git_enriched
latest-items = true
studies = [enrich_demography:git, enrich_git_branches:git, enrich_areas_of_code:git, enrich_onion:git]

[github]
raw_index = github_raw
enriched_index = github_enriched
api-token = ghp_XXX
category = issue
sleep-for-rate = true
no-archive = true
studies = [enrich_onion:github, enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github, enrich_backlog_analysis, enrich_demography:github]

[github:pull]
raw_index = github_pull_raw
enriched_index = github_pull_enriched
api-token = ghp_XXX
category = pull_request
sleep-for-rate = true
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github, enrich_demography:github]


[github2:issue]
api-token = [ghp_XXX, ghp_YYY, ghp_ZZZ]
raw_index = github2-issues_raw
enriched_index = github2-issues_enriched
sleep-for-rate = true
category = issue
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github2, enrich_feelings]
from-date = 2022-01-01

[github2:pull]
api-token = [ghp_XXX, ghp_YYY, ghp_ZZZ]
raw_index = github2-pull_raw
enriched_index = github2-pull_enriched
sleep-for-rate = true
category = pull_request
no-archive = true
studies = [enrich_geolocation:user, enrich_geolocation:assignee, enrich_extra_data:github2, enrich_feelings]
from-date = 2022-01-01

## studies based on enriched indexes

[enrich_demography:git]

[enrich_areas_of_code:git]
in_index = git_raw
out_index = git_aoc_enriched

[enrich_onion:git]
in_index = git_raw
out_index = git_onion_enriched
contribs_field = hash

[enrich_git_branches:git]
run_month_days = [1, 23]

[enrich_extra_data:git]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

[enrich_forecast_activity]
out_index = git_study_forecast


[enrich_onion:github]
in_index_iss = github_issues_onion_src
in_index_prs = github_prs_onion_src
out_index_iss = github_issues_onion_enriched
out_index_prs = github_prs_onion_enriched

[enrich_geolocation:user]
location_field = user_location
geolocation_field = user_geolocation

[enrich_geolocation:assignee]
location_field = assignee_location
geolocation_field = assignee_geolocation

[enrich_extra_data:github]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

#Added as part of github2
[enrich_extra_data:github2]
json_url = https://gist.githubusercontent.com/zhquan/bb48654bed8a835ab2ba9a149230b11a/raw/5eef38de508e0a99fa9772db8aef114042e82e47/bitergia-example.txt

[enrich_feelings]
attributes = [title, body]
nlp_rest_url = http://localhost:2901

#End Here

[enrich_backlog_analysis]
out_index = github_enrich_backlog
interval_days = 7
reduced_labels = [bug,enhancement]
map_label = [others, bugs, enhancements]

[enrich_demography:github]

[enrich_duration_analysis:kanban]
start_event_type = MovedColumnsInProjectEvent
fltr_attr = board_name
target_attr = board_column
fltr_event_types = [MovedColumnsInProjectEvent, AddedToProjectEvent]

[enrich_duration_analysis:label]
start_event_type = UnlabeledEvent
target_attr = label
fltr_attr = label
fltr_event_types = [LabeledEvent]

[enrich_reference_analysis]

I'm attaching a few screenshots for your reference, just to double-check. I could see that after 08th Feb, there is no data available for github2_pull_requests and github2_issues index. I don't see any error in all.log file.

image

image

image

image

image

image

I have observed that the mordred container is being terminated automatically after a few hours. I have included the container logs below for reference.
Currently, as a workaround, I have a shell script in place that starts any container that is not running.

image

[root@grimoire docker-compose]# docker ps
CONTAINER ID   IMAGE                                                     COMMAND                  CREATED        STATUS                 PORTS                                                 NAMES
1f5710dcbe7d   bitergia/kibiter:community-v6.8.6-3                       "/docker_entrypoint.…"   6 months ago   Up 33 hours            0.0.0.0:5601->5601/tcp, :::5601->5601/tcp             docker-compose_kibiter_1
4b2abb019db6   bitergia/mordred:latest                                   "/bin/sh -c ${DEPLOY…"   6 months ago   Up 2 hours (healthy)                                                         docker-compose_mordred_1
14a6926bfd06   grimoirelab/hatstall:latest                               "/bin/sh -c ${DEPLOY…"   6 months ago   Up 33 hours            0.0.0.0:8000->80/tcp, :::8000->80/tcp                 docker-compose_hatstall_1
a295a748c38d   docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.6   "/usr/local/bin/dock…"   6 months ago   Up 33 hours            0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp   docker-compose_elasticsearch_1
37abe7721333   mariadb:10.0                                              "docker-entrypoint.s…"   6 months ago   Up 33 hours            3306/tcp                                              docker-compose_mariadb_1

Could you please review the provided configurations, screenshots, and logs and guide me in the right direction to troubleshoot this issue?
I understand that you may be occupied with other pressing tasks, and I genuinely appreciate your assistance in resolving this matter. With your support, I have made significant progress, and only a few issues remain.

Thanks,
Altif

  • setup.cfg LGTM. Since you have data in the github2 indexes you can remove from-date due to Mordred will fetch them incrementally.
  • dashboards: Try to increase the time picker like Last 1 year instead of Last 7 days
  • indexes: LGTM
  • Mordred container: It seems that your Mordred container cannot connect to github.com. Run Mordred docker container, enter into de container (docker exec -it <mordred> bash), and try to run perceval git https://github.com/....../manager-lambda.git

Thank you @zhquan for replying.

  • setup.cfg LGTM. Since you have data in the github2 indexes you can remove from-date due to Mordred will fetch them incrementally.

I have removed the from-date.

  • dashboards: Try to increase the time picker like Last 1 year instead of Last 7 days

I attempted to set the time range for the past 30 days, but I discovered that the data is unavailable in the github2_issues and github2_pull_requests index patterns after February 8th. I am including reference screenshots.

It's worth mentioning that we have repositories that are frequently accessed, modified, and have pull requests created and merged. This leads me to wonder why there is no data available in either index pattern after 08th Feb.

image (7)

image (8)

  • Mordred container: It seems that your Mordred container cannot connect to github.com. Run Mordred docker container, enter into de container (docker exec -it <mordred> bash), and try to run perceval git https://github.com/....../manager-lambda.git

I observed that sometime Mordred encounters timeout errors while communicating with GitHub, which are likely caused by temporary or brief technical issues on GitHub's end. However, I'm interested in understanding whether a timeout error could result in the termination of the mordred container.

@zhquan Thanks a lot for your time and support, truly appreciate.

It took a few days to collect the pull request data from GitHub for over 500 repositories. I am happy to report that the data has been successfully retrieved and is now accessible through the GitHub Comments PRs dashboard.

Therefore, I am closing this issue as it has been resolved. Thank you once again for your assistance.

image (9)