/function_parser

Fork of the awesome function_parser library from Github's CodeSearchNet Challenge repo: https://github.com/github/CodeSearchNet/tree/master/function_parser

Primary LanguagePythonMIT LicenseMIT

function_parser

This library contains various utils to parse GitHub repositories into function definition and docstring pairs. It is based on tree-sitter to parse code into ASTs and apply heuristics to parse metadata in more details. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. It also parses function calls and links them with their definitions for Python.

Install

pip install function-parser

How to use

In order to use the library you must download and build the language grammars for tree-sitter to parser source code with. Included in the library is a handy CLI tool for setting this up.

To download and build grammars: build_grammars

This command will download and build the grammars in the same location this python library was installed on your computer after pip installing.

import function_parser
import os

import pandas as pd

from function_parser.language_data import LANGUAGE_METADATA
from function_parser.process import DataProcessor
from tree_sitter import Language

language = "python"
DataProcessor.PARSER.set_language(
    Language(os.path.join(function_parser.__path__[0], "tree-sitter-languages.so"), language)
)
processor = DataProcessor(
    language=language, language_parser=LANGUAGE_METADATA[language]["language_parser"]
)

dependee = "keras-team/keras"
definitions = processor.process_dee(dependee, ext=LANGUAGE_METADATA[language]["ext"])
pd.DataFrame(definitions).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
nwo sha path language identifier parameters argument_list return_statement docstring docstring_summary docstring_tokens function function_tokens url
0 keras-team/keras e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b keras/backend.py python backend () return 'tensorflow' Publicly accessible method for determining the... Publicly accessible method for determining the... [Publicly, accessible, method, for, determinin... def backend():\n """Publicly accessible metho... [def, backend, (, ), :, return, 'tensorflow'] https://github.com/keras-team/keras/blob/e43af...
1 keras-team/keras e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b keras/backend.py python cast_to_floatx (x) return np.asarray(x, dtype=floatx()) Cast a Numpy array to the default Keras float ... Cast a Numpy array to the default Keras float ... [Cast, a, Numpy, array, to, the, default, Kera... def cast_to_floatx(x):\n """Cast a Numpy arra... [def, cast_to_floatx, (, x, ), :, if, isinstan... https://github.com/keras-team/keras/blob/e43af...
2 keras-team/keras e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b keras/backend.py python get_uid (prefix='') return layer_name_uids[prefix] Associates a string prefix with an integer cou... Associates a string prefix with an integer cou... [Associates, a, string, prefix, with, an, inte... def get_uid(prefix=''):\n """Associates a str... [def, get_uid, (, prefix, =, '', ), :, graph, ... https://github.com/keras-team/keras/blob/e43af...
3 keras-team/keras e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b keras/backend.py python reset_uids () Resets graph identifiers. Resets graph identifiers. [Resets, graph, identifiers, .] def reset_uids():\n """Resets graph identifie... [def, reset_uids, (, ), :, PER_GRAPH_OBJECT_NA... https://github.com/keras-team/keras/blob/e43af...
4 keras-team/keras e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b keras/backend.py python clear_session () Resets all state generated by Keras.\n\n Kera... Resets all state generated by Keras. [Resets, all, state, generated, by, Keras, .] def clear_session():\n """Resets all state ge... [def, clear_session, (, ), :, global, _SESSION... https://github.com/keras-team/keras/blob/e43af...