Action fails when too many jobs trying to track different repos in the same data repo
ChameleonTartu opened this issue · 10 comments
This project looks amazing!
My idea was to track all public repos and analyze them once in a while. It looks like when I have too many jobs running, the action fails. For instance, when one job is pushed before another one. My GitHub repo.
Also, there is another issue with amazon-mws-subscriptions-maven:
210411-19:09:08.177 INFO:MainThread: union-merge views and clones
Traceback (most recent call last):
File "/fetch.py", line 314, in <module>
main()
File "/fetch.py", line 73, in main
) = fetch_all_traffic_api_endpoints(repo)
File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat
op = _Concatenator(
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 467, in __init__
self.new_axes = self._get_new_axes()
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 537, in _get_new_axes
return [
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 538, in <listcomp>
self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 544, in _get_comb_axis
return get_objs_combined_axis(
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 92, in get_objs_combined_axis
return _get_combined_index(obs_idxes, intersect=intersect, sort=sort, copy=copy)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 145, in _get_combined_index
index = union_indexes(indexes, sort=sort)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 214, in union_indexes
return result.union_many(indexes[1:])
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 395, in union_many
this, other = this._maybe_utc_convert(other)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 413, in _maybe_utc_convert
raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex")
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex
Another data frame issue:
210411-19:09:18.943 INFO: parsed timestamp from path: 2021-04-11 19:09:15+00:00
Traceback (most recent call last):
File "/analyze.py", line 1398, in <module>
main()
File "/analyze.py", line 82, in main
analyse_view_clones_ts_fragments()
File "/analyze.py", line 691, in analyse_view_clones_ts_fragments
if df.index.max() > snapshot_time:
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
+ ANALYZE_ECODE=1
error: analyze.py returned with code 1 -- exit.
Git clone issue:
GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.
All other issues are the same as those mentioned.
@jgehrcke Let me know if I can help more than just reporting this. It would be great to fix all of this, to use this tool more extensively, as I am planning to grow the number of repos from 34 to more over time. It is the most valuable tool, I could find for tracking repo development over time. Thank you again!
Traceback (most recent call last):
File "/fetch.py", line 314, in <module>
main()
File "/fetch.py", line 73, in main
) = fetch_all_traffic_api_endpoints(repo)
File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
[...]
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex
I could not quite make sense of this one. Both, df_clones
and df_views
are created by the same code path. I thought maybe when one of both is empty this might be the fallout with a misleading error, but no:
± python
iPython 3.8.6 (default, Nov 22 2020, 17:14:35)
[GCC 10.2.1 20201016 (Red Hat 10.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')
>>> df_aware = pd.DataFrame(data={'lol': [1, 2, 3]}, index=tz_aware)
>>> df_aware
lol
2018-03-01 09:00:00-05:00 1
2018-03-02 09:00:00-05:00 2
2018-03-03 09:00:00-05:00 3
>>> df_empty = pd.DataFrame(data={}, index=[])
>>> pd.concat([df_aware, df_empty], axis=1, join="outer")
lol
2018-03-01 09:00:00-05:00 1
2018-03-02 09:00:00-05:00 2
2018-03-03 09:00:00-05:00 3
I am adding a patch that changes the way the DatetimeIndex is translated to a tz-aware object, which hopefully addresses this problem. It's a little disappointing to not understand it precisely.
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
That somewhat suggests that df_clones and df_views looked rather differently structurally than what's expected.
Update: empty index explains that error msg:
>>> df_empty.index.max() > datetime(year=2012, month=3, day=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.
Could it be that this token was actually truncated and/or maybe this is related to one of your code changes?
I notice secrets.ACCESS_GITHUB_API_TOKEN
but with current code this should actually look very differently:
git clone https://ghactions:${GHRS_GITHUB_API_TOKEN}@github.com/${DATA_REPOSPEC}.git
When things work as expected, that should be the log pattern:
GHRS entrypoint.sh: pwd: /github/workspace
+ git clone ***github.com/jgehrcke/ghrs-test.git .
length of API TOKEN: 40
Cloning into '.'...
It's likely that the error message fatal: Too many arguments.
was as of the misconstructed git clone ...
command.
@ChameleonTartu would you mind retrying things with the current head of main? I think I've addressed all issued reported to date (maybe have a look at the changelog). Happy to cut a release, but ideally only after getting your confirmation that things indeed work.
@jgehrcke I made a run: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748508227
The only use-case that doesn't work is:
GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.
And all jobs failed with the same message: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/runs/2343584927?check_suite_focus=true
I suspect that repos may have been created a long time ago, so they have different API token formats, can it be the cause? Any idea?
The only use-case that doesn't work is:
OK, you're workflow file is bad in a subtle way! Mean trap: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/blob/b6d089f2bc01462e05fe8100ce1f27cfd3a24909/.github/workflows/stats.yml#L138
@ChameleonTartu you have ghtoken: ${ secrets.ACCESS_GITHUB_API_TOKEN }
, but these curly braces need to be pairs of them: ${{ ... }}
-- in most jobs, you have that.
@jgehrcke Thank you! I didn't notice these nuances.
I auto-generated some of the jobs, so it looks like I got some of them wrong. Cool-cool-cool!
@ChameleonTartu ok : ) Please leave feedback again when the current head of main worked for all your jobs : )
@jgehrcke Everything works smoothly: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748788117
Amazing!