Feature request: parse TensorFlow documentation
SamuelMarks opened this issue · 3 comments
TensorFlow is a popular open-source ML framework/ecosystem from Google.
Unfortunately your parser doesn't work well on its docstring. Here's a link to the docstring:
https://github.com/tensorflow/tensorflow/blob/9df9d06/tensorflow/python/keras/optimizer_v2/adam.py#L35-L103
Snippet:
r"""Optimizer that implements the Adam algorithm.
…
[Kingma et al., 2014](http://arxiv.org/abs/1412.6980),
the method is "*computationally
efficient, has little memory requirement, invariant to diagonal rescaling of
gradients, and is well suited for problems that are large in terms of
data/parameters*".
Args:
learning_rate: A `Tensor`, floating point value, or a schedule that is a
`tf.keras.optimizers.schedules.LearningRateSchedule`, or a callable
that takes no arguments and returns the actual value to use, The
learning rate. Defaults to 0.001.
…
9.9
Reference:
- [Kingma et al., 2014](http://arxiv.org/abs/1412.6980)
- [Reddi et al., 2018](
https://openreview.net/pdf?id=ryQu7f-RZ) for `amsgrad`.
…
$ ipython3
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
In [2]: import inspect
In [3]: import docstring_parser
In [4]: docstring = docstring_parser.parse(inspect.getdoc(tf.keras.optimizers.Adam), style=docstring_parser.Style.google)
In [5]: tuple(map(lambda param: param.arg_name, docstring.params))
Out[5]:
('learning_rate',
'beta_1',
'beta_2',
'epsilon',
'amsgrad',
'name',
'**kwargs',
'- [Kingma et al., 2014](http',
'- [Reddi et al., 2018](\n https')
One hack if you don't want to fix your parser is to post-process dropping anything that fails an isidentifier
check (Python 2 implementation).
It will raise error as well on: 'tf.keras.layers.Layer'
The error happens at this line:
$ ipython3
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
In [2]: import docstring_parser
In [3]: docstring = docstring_parser.parse(inspect.getdoc(tf.keras.layers.Layer), style=docstring_parser.Style.google)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-27091b0195cd> in <module>
----> 1 docstring = docstring_parser.parse(inspect.getdoc(tf.keras.layers.Layer), style=docstring_parser.Style.google)
~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/parser.py in parse(text, style)
14
15 if style != Style.auto:
---> 16 return STYLES[style](text)
17 rets = []
18 for parse_ in STYLES.values():
~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in parse(text)
272 :returns: parsed docstring
273 """
--> 274 return GoogleParser().parse(text)
~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in parse(self, text)
262 for j, (start, end) in enumerate(c_splits):
263 part = chunk[start:end].strip("\n")
--> 264 ret.meta.append(self._build_meta(part, title))
265
266 return ret
~/.pyenv/versions/3.7.2/envs/kerod/lib/python3.7/site-packages/docstring_parser/google.py in _build_meta(self, text, title)
104
105 # Split spec and description
--> 106 before, desc = text.split(":", 1)
107 if desc:
108 desc = desc[1:] if desc[0] == " " else desc
ValueError: not enough values to unpack (expected 2, got 1)
@EmGarr I built my own parsers for all 3 docstring formats, classes, functions, and argparse-parsers.
$ ipython
Python 3.8.7 (default, Dec 30 2020, 22:35:32)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from doctrans import emit, parse
In [2]: from doctrans.source_transformer import to_code
In [3]: import tensorflow as tf
In [4]: parse.class_(tf.keras.layers.Layer)
Out[4]:
{'name': 'Layer',
'doc': "This is the class from which all layers inherit.\n\nA layer is a callable object that takes as input one or more tensors and\nthat outputs one or more tensors. It involves *computation*, defined\nin the `call()` method, and a *state* (weight variables), defined\neither in the constructor `__init__()` or in the `build()` method.\n\nUsers will just instantiate a layer and then treat it as a callable.\n\n\nAttributes:\n name: The name of the layer (string).\n dtype: The dtype of the layer's weights.\n variable_dtype: Alias of `dtype`.\n compute_dtype: The dtype of the layer's computations. Layers automatically\n cast inputs to this dtype which causes the computations and output to also\n be in this dtype. When mixed precision is used with a\n `tf.keras.mixed_precision.Policy`, this will be different than\n `variable_dtype`.\n dtype_policy: The layer's dtype policy. See the\n `tf.keras.mixed_precision.Policy` documentation for details.\n trainable_weights: List of variables to be included in backprop.\n non_trainable_weights: List of variables that should not be\n included in backprop.\n weights: The concatenation of the lists trainable_weights and\n non_trainable_weights (in this order).\n trainable: Whether the layer should be trained (boolean), i.e. whether\n its potentially-trainable weights should be returned as part of\n `layer.trainable_weights`.\n input_spec: Optional (list of) `InputSpec` object(s) specifying the\n constraints on inputs that can be accepted by the layer.\n\nWe recommend that descendants of `Layer` implement the following methods:\n\n* `__init__()`: Defines custom layer attributes, and creates layer state\n variables that do not depend on input shapes, using `add_weight()`.\n* `build(self, input_shape)`: This method can be used to create weights that\n depend on the shape(s) of the input(s), using `add_weight()`. `__call__()`\n will automatically build the layer (if it has not been built yet) by\n calling `build()`.\n* `call(self, *args, **kwargs)`: Called in `__call__` after making sure\n `build()` has been called. `call()` performs the logic of applying the\n layer to the input tensors (which should be passed in as argument).\n Two reserved keyword arguments you can optionally use in `call()` are:\n - `training` (boolean, whether the call is in\n inference mode or training mode)\n - `mask` (boolean tensor encoding masked timesteps in the input, used\n in RNN layers)\n* `get_config(self)`: Returns a dictionary containing the configuration used\n to initialize this layer. If the keys differ from the arguments\n in `__init__`, then override `from_config(self)` as well.\n This method is used when saving\n the layer or a model that contains this layer.\n\nExamples:\n\nHere's a basic example: a layer with two variables, `w` and `b`,\nthat returns `y = w . x + b`.\nIt shows how to implement `build()` and `call()`.\nVariables set as attributes of a layer are tracked as weights\nof the layers (in `layer.weights`).\n\n```python\nclass SimpleDense(Layer):\n\n def __init__(self, units=32):\n super(SimpleDense, self).__init__()\n self.units = units\n\n def build(self, input_shape): # Create the state of the layer (weights)\n w_init = tf.random_normal_initializer()\n self.w = tf.Variable(\n initial_value=w_init(shape=(input_shape[-1], self.units),\n dtype='float32'),\n trainable=True)\n b_init = tf.zeros_initializer()\n self.b = tf.Variable(\n initial_value=b_init(shape=(self.units,), dtype='float32'),\n trainable=True)\n\n def call(self, inputs): # Defines the computation from inputs to outputs\n return tf.matmul(inputs, self.w) + self.b\n\n# Instantiates the layer.\nlinear_layer = SimpleDense(4)\n\n# This will also call `build(input_shape)` and create the weights.\ny = linear_layer(tf.ones((2, 2)))\nassert len(linear_layer.weights) == 2\n\n# These weights are trainable, so they're listed in `trainable_weights`:\nassert len(linear_layer.trainable_weights) == 2\n```\n\nNote that the method `add_weight()` offers a shortcut to create weights:\n\n```python\nclass SimpleDense(Layer):\n\n def __init__(self, units=32):\n super(SimpleDense, self).__init__()\n self.units = units\n\n def build(self, input_shape):\n self.w = self.add_weight(shape=(input_shape[-1], self.units),\n initializer='random_normal',\n trainable=True)\n self.b = self.add_weight(shape=(self.units,),\n initializer='random_normal',\n trainable=True)\n\n def call(self, inputs):\n return tf.matmul(inputs, self.w) + self.b\n```\n\nBesides trainable weights, updated via backpropagation during training,\nlayers can also have non-trainable weights. These weights are meant to\nbe updated manually during `call()`. Here's a example layer that computes\nthe running sum of its inputs:\n\n```python\nclass ComputeSum(Layer):\n\n def __init__(self, input_dim):\n super(ComputeSum, self).__init__()\n # Create a non-trainable weight.\n self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),\n trainable=False)\n\n def call(self, inputs):\n self.total.assign_add(tf.reduce_sum(inputs, axis=0))\n return self.total\n\nmy_sum = ComputeSum(2)\nx = tf.ones((2, 2))\n\ny = my_sum(x)\nprint(y.numpy()) # [2. 2.]\n\ny = my_sum(x)\nprint(y.numpy()) # [4. 4.]\n\nassert my_sum.weights == [my_sum.total]\nassert my_sum.non_trainable_weights == [my_sum.total]\nassert my_sum.trainable_weights == []\n```\n\nFor more information about creating layers, see the guide\n[Writing custom layers and models with Keras](\n https://www.tensorflow.org/guide/keras/custom_layers_and_models)",
'params': OrderedDict([('trainable',
{'doc': "Boolean, whether the layer's variables should be trainable."}),
('name', {'doc': 'String name of the layer.'}),
('dtype',
{'doc': "The dtype of the layer's computations and weights. Can also be a `tf.keras.mixed_precision.Policy`, which allows the computation and weight dtype to differ. Default of `None` means to use `tf.keras.mixed_precision.global_policy()`, which is a float32 policy unless set to different value."}),
('dynamic',
{'doc': 'Set this to `True` if your layer should only be run eagerly, and should not be used to generate a static computation graph. This would be the case for a Tree-RNN or a recursive network, for example, or generally for any layer that manipulates tensors using Python control flow. If `False`, we assume that the layer can safely be used to generate a static computation graph.'}),
('_TF_MODULE_IGNORED_PROPERTIES',
{'default': "```frozenset(itertools.chain(('_obj_reference_counts_dict',), module.Module.\n _TF_MODULE_IGNORED_PROPERTIES))```"}),
('_must_restore_from_config',
{'default': False, 'typ': 'bool'})]),
'returns': None,
'_internal': {'body': ['<omitted for brevity>']},
'from_name': 'Layer',
'from_type': 'cls'}}
This probably is fixed in 0.8. In any case seeing how the OP created their own project to parse these docstrings, I don't think this issue stays relevant.