simonw/symbex

Filters: `--async` and `--function` and `--class` and various typing ones

simonw opened this issue · 16 comments

The --signatures option turns out to be a pretty great way to start navigating a new codebase.

It might be useful to be able to filter by types of content. Potentially the following:

  • --async - return async function (and method) definitions
  • --function - just functions
  • --class - just classes
  • --method - just class methods

These would be additive, so the following:

symbex --signatures --method --function

Would return all methods and all functions.

But... what would this do?

symbex -s --async

Would it return all async functions AND async methods? If so, would combining it with --function or --method not make a difference?

Or should there be a --async-method filter that's different from --async (which only gets async functions)?

I'll prototype it and play with it and see how it feels.

This is a pretty fun prototype:

% symbex --async -d ../datasette -s | head -n 20 
# File: /Users/simon/Dropbox/Development/datasette/test-plugins/register_output_renderer.py Line: 6
async def can_render(datasette, columns, rows, sql, query_name, database, table, request, view_name)

# File: /Users/simon/Dropbox/Development/datasette/test-plugins/register_output_renderer.py Line: 26
async def render_test_all_parameters(datasette, columns, rows, sql, query_name, database, table, request, view_name, data)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 6
async def can_render(datasette, columns, rows, sql, query_name, database, table, request, view_name)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 26
async def render_test_all_parameters(datasette, columns, rows, sql, query_name, database, table, request, view_name, data)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 60
async def render_response(request)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/memory_name.py Line: 6
async def startup(datasette)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/show_open_files.py Line: 6
async def show_open_files()

Needs a bit more thought, then tests and docs.

I didn't implement --method yet, not completely convinced it is necessary.

A --typed filter that just returns things that have type signatures - and a --untyped one that returns things without type signatures - might be neat too.

Maybe this:

  • --typed - any function with at least one of its arguments or return using type annotations
  • --untyped - not a single type annotation
  • --fully-typed - all arguments plus the return value have type annotations (a subset of --typed)
  • --partially-typed - some but not all of the arguments and return value have annotations

If you are working on a project and trying to add types to every single function, you can iterate on it using this to find the functions that still need some work:

symbex --untyped --partially-typed --signatures

Maybe --method isn't necessary because you can use a '*.*' selector instead?

Or maybe --method is a shortcut for that adds the '*.*' selector?

It would be slightly surprising that while most of these filter options add together, --method --async would filter methods to just the async ones.

For that reason I think using *.* to filter methods may be more consistent.

Confirmed, this works already to get all async methods:

symbex -d ../datasette -s --async '*.*'

I built a messy prototype of the typing ones and I like them a lot:

symbex --typed -s
# File: tests/example_symbols.py Line: 50
def func_type_annotations(a: int, b: str) -> bool

# File: tests/example_symbols.py Line: 94
def function_with_non_pep_0484_annotation(x: ?, xx: ?, yy: ?, y: ?, zz: float) -> ?

# File: tests/example_symbols.py Line: 104
def complex_annotations(code: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

# File: symbex/lib.py Line: 11
def find_symbol_nodes(code: str, filename: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

# File: symbex/lib.py Line: 35
def code_for_node(code: str, node: AST, class_name: str, signatures: bool) -> Tuple[(str, int)]

# File: symbex/lib.py Line: 66
def match(name: str, symbols: Iterable[str]) -> bool

# File: symbex/lib.py Line: 91
def function_definition(function_node: AST)

# File: symbex/lib.py Line: 193
def annotation_definition(annotation: AST) -> str

# File: symbex/lib.py Line: 241
def annotation_summary(node: AST) -> AnnotationSummary
symbex --untyped -s
# File: tests/example_symbols.py Line: 10
def func_positional_args(a, b, c)

# File: tests/example_symbols.py Line: 15
async def async_func(a, b, c)

# File: tests/example_symbols.py Line: 20
def func_default_args(a, b=2, c=3)

...

Am I right that these should always be OR and not AND?

I just tried this and it didn't give me what I expected:

symbex --async --typed -s

Turns out I expected just typed async functions, but I got back ALL async functions and then ALL typed functions.

I'm going to try these as AND rather than OR and see how that feels.

Yes, AND for filters feels better. Switching to that.

This method is failing the test for --fully-typed:

class ClassForTypedTests:
    def method_fully_typed(self, a: int, b: str) -> bool:
        pass

Because self has no type. So I need a special case check for class methods.

I'm going to refactor AnnotationSummary - it looks like this:

AnnotationSummary = namedtuple(
    "Summary", ["num_arguments", "num_typed", "return_is_typed"]
)

I'm going to change it to this:

class TypeSummary:
    fully: bool
    partially: bool
)

Because self has no type. So I need a special case check for class methods.

That caused another problem: now --typed is returning any method with a self argument because I wrote code that says "treat that as typed" - even though there are no type annotations on anything else.

Here's a fun trick:

 llm --system 'Suggest several ways to refactor this' '    if (
        not symbols
        and not signatures
        and not async_
        and not function
        and not class_
        and not typed
        and not untyped
        and not partially_typed
        and not fully_typed
    ):
        ctx = click.get_current_context()
        click.echo(ctx.get_help())
        ctx.exit()'

Output:

There are several ways to refactor this code:

  1. Consolidate all the boolean checks into a list and use all() to check if all items are False:
checks = [
    not symbols, not signatures, not async_,
    not function, not class_, not typed,
    not untyped, not partially_typed,
    not fully_typed
]

if all(checks):
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()
  1. Use De Morgan's Law to simplify the boolean checks:
if (
    symbols or signatures or async_ or function or
    class_ or typed or untyped or partially_typed or
    fully_typed
):
    pass
else:
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()
  1. Use any() with the negation of the boolean checks:
if not any([
    symbols, signatures, async_, function,
    class_, typed, untyped, partially_typed,
    fully_typed
]):
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()

All three options achieve the same result, so it's up to personal preference on which one to choose.

One last thought: an __init__() method should not need a return type annotation to qualify as not breaing the fully typed rule.