chaoss/grimoirelab-graal

Add scancli option to CoLic Backend

inishchith opened this issue ยท 15 comments

Adding support of a faster version of scancode ( scancli ) to CoLic Backend.

@valeriocos Please let me know if i can work on this.
Thanks

Sure @inishchith , thanks !
You can find useful info at the following urls:

A possible implementation could add a boolean param cli here: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/backends/core/analyzers/scancode.py#L41 and the method analyze could be modified to call two private methods: analyze_scancode and analyze_scancode_cli (depending on the value of cli), the former would contain the code of the current analyze method and the other some code similar to this one.

The code of the colic backend shouldn't probably changed too much (just adding new categories and related code).

What do you think ?

@valeriocos Thanks for the supporting links and insights on how to go about the task.
I'll start working on the task and open a PR once done, then we can have further discussion over there.

Thanks :)

@valeriocos can you share the version of scancode release or the setup that you used in order to run scancli successfully?
I read the discussion on nexB/scancode-toolkit#1400 but couldn't reproduce the results as I ran into multiple errors, so thought of asking before moving forward.

Thanks

Sorry for the late reply @inishchith

In the virtual env used by graal, I installed simplejson and execnet as reported here: nexB/scancode-toolkit@8afa686#diff-f826f8c8f6f35f368b2a692610f05d62R18

Then I used the following branch: https://github.com/valeriocos/grimoirelab-graal/tree/test-scancli/graal, and launched the backend in the following way:

colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/xyzw
--exec-path
/home/scancode-toolkit/scancode (v3.0.0 downloaded from here: https://github.com/nexB/scancode-toolkit/releases
--category
code_license_scancode
--json

Note that you have to modify the method metadata to include the param filtered_classified

Tomorrow I can push a better version of the code of my branch.

Hope it helps :)

@valeriocos Thanks for sharing the information.

  • These changes were introduced on 5th March 2019 and Scancode-toolkit v3.0.0 was released on 15th Feb 2019 ( i.e before the changes were made ), Hence there doesn't existscancli.py.

Please do correct me here if I'm wrong or have missed something out. Thanks

Sorry @inishchith I made a mistake. It wasn't version 3.0.0, but the checkout at nexB/scancode-toolkit@8afa686 (as reported here: nexB/scancode-toolkit#1400 (comment)). The code was then merged in the develop branch (as reported here: nexB/scancode-toolkit#1400 (comment)).

If you clone the repo and use the current develop branch, the backend should work (https://github.com/nexB/scancode-toolkit/tree/develop).

Let me know if you have any problem, thanks :)

@valeriocos Sorry for the delayed response.

I tried reproducing the results using your setup information and the test-scancli branch of your fork. But I couldn't do it, I feel there has been some change to the implementation since. I've shared the error log. Please do let me know if you've encountered it before or i must have missed something out. Thanks :)

  • Error log
[2019-05-13 16:50:03,438] - Starting the quest for the Graal.
[2019-05-13 16:50:10,816] - Git worktree /tmp/worktrees/tmp2 created!
[2019-05-13 16:50:10,817] - Fetching commits: 'https://github.com/chaoss/grimoirelab-toolkit' git repository from 1970-01-01 00:00:00+00:00 to 2100-01-01 00:00:00+00:00; all branches
[2019-05-13 16:50:12,460] - Git repository tmp2 checked out!
Traceback (most recent call last):
  File "/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py", line 72, in <module>
    for s in scan(args):
  File "/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py", line 63, in scan
    results = channel.receive()
  File "/usr/local/lib/python3.6/site-packages/execnet/gateway_base.py", line 728, in receive
    raise self._getremoteerror() or EOFError()
execnet.gateway_base.RemoteError: Traceback (most recent call last):
  File "<string>", line 1063, in executetask
  File "<string>", line 1, in do_exec
  File "<remote exec>", line 53, in <module>
  File "<remote exec>", line 44, in run_scan
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 864, in run_scan
    quiet=quiet, verbose=verbose, kwargs=kwargs, echo_func=echo_func,
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 1054, in run_scanners
    with_timing=timing, progress_manager=progress_manager)
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 1145, in scan_codebase
    location, rid, scan_errors, scan_time, scan_result, scan_timings = scans.next()
AttributeError: 'list' object has no attribute 'next'

[2019-05-13 16:52:39,704] - Analysis failed at 9dc821962567715e5358b1192e1b15d8868d2b6c
Traceback (most recent call last):
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 62, in analyze
    msg = subprocess.check_output(cmd_scancli).decode("utf-8")
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['python3', '/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py', '/tmp/worktrees/tmp2/.gitignore', '/tmp/worktrees/tmp2/AUTHORS', '/tmp/worktrees/tmp2/LICENSE', '/tmp/worktrees/tmp2/grimoirelab/__init__.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py', '/tmp/worktrees/tmp2/setup.cfg']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 472, in run
    for item in items:
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 589, in fetch
    raise e
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 583, in fetch
    for item in items:
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 162, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 183, in fetch_items
    raise e
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 176, in fetch_items
    commit['analysis'] = self._analyze(commit)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 161, in _analyze
    analysis = self.analyzer.analyze(local_paths)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 204, in analyze
    analysis = self.analyzer.analyze(**kwargs)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 65, in analyze
    e.output.decode("utf-8")))
graal.graal.GraalError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg, 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/graal", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 125, in <module>
    main()
  File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 71, in main
    cmd.run()
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 482, in run
    raise RuntimeError(str(e))
RuntimeError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg, 

No worries @inishchith :)

I have uploaded a branch with some improvements in the code, however I confirm what you reported: the errors you posted appear when using the develop or master branches of the original repo. However if you perform the following steps and run the same code, no errors pop up:

git clone https://github.com/nexB/scancode-toolkit
git checkout -b xxx 8afa686fb71b9540029234e5a40c0572c4457c28
colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/cdefgh
--exec-path
/home/graal-libs/scancode-toolkit/etc/scripts/scancli.py <-- the repo just downloaded
--category
code_license_scancode_cli
--json

I'll keep investigating and let you know about the advances

@valeriocos Thanks for checking the issue out. After the checkout commit, I could reproduce the results ๐Ÿ‘

Also I checked out your implementation of scancode_cli here.
I noticed that you're passing all the files at once as arguments instead of passing files individually as per the in-place convention, does it provide enhanced performance in the former case?
I didn't get time to test the ways thoroughly hence thought of asking :)

Great @inishchith !

Also I checked out your implementation of ....

Yes, this is one of the feature of scancli (check the comment here: nexB/scancode-toolkit#1400 (comment), and the following one).

If you test scancode and scancli against https://github.com/chaoss/grimoirelab-toolkit you should see the difference.

@valeriocos thanks for answering.
As my unversity exams are under way, i'll work on this when time permits.
I'll probably test scancode and scancli to check the difference tomorrow and continue the work which is currently staged.

Sorry for the delayed response.

No worries @inishchith , I have just open a PR (#28) with some code to use scancli.

Feel free to work on that PR or create a new one.

@valeriocos Sure.
I checked out #28 , The work that i've done until now seems similar.
Still, I'll open a PR in some time so that we can work on adding tests for it too.

Thanks

@valeriocos I think we can close this. what do you think?

Sure @inishchith , feel free to close it.