priv-kweihmann/multimetric

How to use this tool as an API in Python code?

Opened this issue · 5 comments

How to use this tool as an API in Python code?

Hi @zhimin-z,

currently that's not a supported use case - but it should be doable quite easily.
What needs to be done is

  • move most of the processing code of __main__.py:main into a separate function, taking all of the arguments currently extracted from the
    _args = ArgParser()
    call
  • new main should just consist of the argparse call, the call to run the newly created function and the printing of the results

Once that is done you could just add from multimetric.__main__ import <newly added function> in your code and run as if you would run the same from CLI.

Maybe @aylusltd has the time to do that, but also feel free to provide the necessary patches - PRs highly welcome

In the meantime you could as well wrap the invocation of this tool via subprocess and just parse the output

json.loads(subprocess.check_output(['multimetric', <args go here>, *files], universal_newlines=True))

If I have a dataframe consisting of code & text, how could I call multimetric to parse it?
For example , my dataframe looks like this:
image
I do not want to delete the created time & closed time, but I want to feed the entire dataframe to multimetric.

@zhimin-z the example doesn't look like code to me, but if it were code I would say, as I think the data from that table comes from some short of structure the following pseudo code could work

import tempfile
import subprocess
import json

for item in datastructure:
    with tempfile.TemporaryFile() as i:
        i.write(item['Challenge_body'])
        i.flush()
        i.seek(0)
        try:
            item['multimetric'] = json.loads(subprocess.check_output(['multimetric', i.name], universal_newlines=True))
        except:
            pass

The matching result to each row would then be part accessible as 'multmetric' for any kind of further processing.
One caveat here, as the input file doesn't have any extension, you might need to pass the language to be used for the lexer manually (see the README for more details)

@zhimin-z the example doesn't look like code to me, but if it were code I would say, as I think the data from that table comes from some short of structure the following pseudo code could work

import tempfile
import subprocess
import json

for item in datastructure:
    with tempfile.TemporaryFile() as i:
        i.write(item['Challenge_body'])
        i.flush()
        i.seek(0)
        try:
            item['multimetric'] = json.loads(subprocess.check_output(['multimetric', i.name], universal_newlines=True))
        except:
            pass

The matching result to each row would then be part accessible as 'multmetric' for any kind of further processing. One caveat here, as the input file doesn't have any extension, you might need to pass the language to be used for the lexer manually (see the README for more details)

Thanks, @priv-kweihmann! I wonder if multmetric can process code within text since I saw there is a metric called "code_comment_ratio", does it work for code within text as shown above?

@zhimin-z out of the box, no, the tool can't do that right now.
But in theory you could write your own lexer for that https://pygments.org/docs/lexerdevelopment/ - a lexer is needed for doing all the computation of the statistics