apache/datafusion-sqlparser-rs

[EPIC] Complete Span (source location) information / feature

alamb opened this issue · 1 comments

alamb commented

This ticket tracks the work remaining to complete adding source location information into sqlparser

Background

  • @Nyrox and @iffyio introduced the foundations for storing source location information in the AST nodes in #1435. This information can be used to provide more specific error messages, and potentially syntax highlighting among other great things.
  • In order to 1. minimize the disruption to downstream projects that use sqlparser-rs and 2. avoid a single massive PR and 3. work together as a community, we are implementing this feature incrementally over several releases.

Let's use this ticket to organize needed / remaining work. If you find additional features are needed / issues, please leave a comment on this ticket

Source Span Contributing Guidelines

For contributing source spans improvement in addition to the general
contribution guidelines, please make sure to pay attention to the
following:

  • Ident always have correct source spans

  • We try to minimize downstream breaking changes

  • Consider using Span::union in favor of storing spans on all nodes

  • Any metadata added to compute spans must not change semantics (Eq, Ord, Hash, etc.). See AttachedToken for more information.

When adding support for source spans on a type, consider the impact to consumers of that type and whether your change would require a consumer to do non-trivial changes to their code.

Example of a trivial change

match node {  
   ast::Query { 
     field1,
     field2, 
     location: _,  // add a new line to ignored location
 }

If adding source spans to a type would require a significant change like wrapping the type, please open an issue to discuss. 

# AST Node Equality and Hashes

When adding tokens to AST nodes, make sure to store them using the [AttachedToken](https://docs.rs/sqlparser/latest/sqlparser/ast/helpers/struct.AttachedToken.html) (TODO UPDATE SOURCE REFERENCE)to ensure that semantically equivalent AST nodes compare as equal and hash to the same value. i.e. `select 5` and `SELECT 5` would compare as different `Select` nodes, if the select token was stored directly. f.e.

```rust
struct Select {
     select_token: AttachedToken, // only used for spans
     /// remaining fields
     field1,
     field2,
     ...
 }

Some high level work (list from #1435)

  • Store keyword TokenWithLocation for expressions that currently don't have them
  • Implement spans for the rest of the AST, namely Statements

Tasks

  • #1563
  • Look into reducing AST size with smaller offset sizes (e.g. u32 rather than usize)
  • Store spans for ast::value::Value
alamb commented

I think the initial thing we should do to kick this project is get a test pattern setup:

Then we should be good to start cranking out location information