demergent-labs/cdk_framework

CDK refactor

bdemann opened this issue · 15 comments

CDK Refactor:

Phase 2:

  • Parse in a single pass instead of making a tree
  • Set up ACT to support new parsing method
    • Reintroduce top level DataTypes
    • Get rid of Proclaim implementation and simplify to just a to_token_stream() function
  • Rework ToDeclaration trait
    • Rename Declaration to Proclamation and allow token streams that represent declarations to actually be called declarations.
    • Rename children to be inline declarations
    • Go through all implementations and make sure that only the inline declarations are being created.
  • Organize impls and functions in each file (see #27 (comment))
  • Address notes from initial review
  • Only use Member and HasMember for variants, records, and tuples
    • Make sure tuples are well documented as to how they are different. (see #27 (comment))
    • Create a new thing for TypeAlias, Array, and Opt
  • Consider getting rid of the Proclaim impl for Vec<T>
    • Also for Option<T>
  • use vec instead of hashmap for the inline types (see: #27 (comment))
  • use Declaration and TypeAnnotation type aliases wherever we can to increase clarity
  • get rid of the found_types for to_data_type. Now that we are parsing in a single pass we don't need it anymore.
  • initial lib.rs clean up
  • Get rid of boxed (if we aren't using it. I'm pretty sure we shouldn't be using it)
  • Consider having access to both strings and idents as necessary (see #27 (comment))
  • Move proclaim and declaration into node mod

Phase 1:

  • Remove Act from all of the datatypes and make sure that we use data_type::Option etc
  • Make sure all of the canister methods are options if we sometimes don't use them.
    • Init and post we will always need
  • Make the ACT nested to reflect the future folder structure
  • For data types:
    • Remove type ref as a concept
    • Add a type alias data type (it will be similar to typeref)
      • a type alias will have a name and a data type and that data type could be another type alias so it will be type alias all the way down.
    • Remove the LiteralOrTypeAlias
    • Remove hash and use parent node info to create a unique name.
    • Remove the global list of all types and just have the types be the leaf nodes of canister methods
  • Make sure all TODO are gone
  • Make sure all contexts that should be keyword_lists are name as such
  • Try to get rid of muts for maps
  • I am not sure that Has Members is the right fit for Func. Investigate why we have it and if we can get away with out it
    • Address #27 (comment)
    • I think once we get rid of the inline types thing and just get child declarations correctly we should be good to go
    • Also look at
      • opt
      • array
      • type alias
  • Imports
    • Make sure we are importing instead of qualifying
    • clean up imports
      • make sure things from super are from super
      • make sure things from crate are from crate
      • make sure they are organized according to the style guide
      • make sure all of the mods are public or have a good reason for private
  • Make sure that all of the prefixes are consistent.
    • prefixes should only have to go back as far as the first parent with an actual name
    • when you receive a prefix it's the name of your parent, when you pass a prefix it's your name, plus any positional modifiers that your children wouldn't know about.
    • If you have a name you don't need to pass on your parent's info to your children
  • Separate TokenStreams into all of the things that tokenstreams are supposed to represent
    • declarations
    • type annotations
    • just random groups of tokens. (If it's just a random group of tokens then it shouldn't be treated like it might be a type annotation, and a type annotation certainly shouldn't be treated like a declaration. Let's make it more clear when we are create which)
    • get rid of ToTokenStream trait. (It doesn't make sense to group all of them together anymore)
    • update the declaration to have completely flat children, that is a hashmap with tokenstreams instead of hashmap with declarations.
    • Make sure that both children and declaration get called when creating children
    • Address #27 (comment)
      • Same question but for External Canisters
  • Make sure that AbstractCanisterTree properly implements to definition.
    • Have a public facing function called to_token_stream() that takes nothing in and creates the declaration and children declarations for the act
  • Remove the type ref act data type
  • Get rid of the collect_inline_types function. We shouldn't need it anymore
  • Look at final Act prefix holdouts
    • Actable trait
    • ToAct trait
    • ToActDataTypeNode
    • ActNode (at the highest level do we need to have ActNode? or is it just Node?)
  • get rid of GetAllTypes probably
  • Simplify the act Declaration: see #27 (comment)
  • Make sure that if we are in a tree loop that we can get out of it. So if a child has a parent as one of it's children or other descendants we need to see that and stop going down that path.
    • address this: #27 (comment)
    • Only box things that need to be boxed
      • I think we can get rid of this and use the boxed data type instead.
  • Ask Jordan and Dan about the generate_function function in update and query that are the exact same but duplicated
  • rename function_signature to better describe what it actually is and rename the function that generates it to match
    • also bring it out to be called something that represnets both query and update. Maybe generate_public_entry_point_function_body. generate_public_canister_method_body. generate_requestable_canister_method_body.

We need some way to recreate type annotations consistently. If we declare a type one way and then try to use it in a type annotation another way then our rust won't compile. When I initially designed this, I thought that the only times we were declaring types and using type annotations where when we were creating all of the node declarations, and at that point we do have all of the info, but that is not true. We are also doing it when we are making the async result handler for example, or the manual reply things, because we need to know the type so that we can try_from_vm_value correctly.

So the questions are:

  1. Do we have enough context to recreate the type annotation for each data type when we are creating the async result handler match arms?
  2. If so how do we ensure that we create it consistently?

Expected Rust Output

#[derive(
    serde :: Deserialize,
    Debug,
    candid :: CandidType,
    Clone,
    CdkActTryIntoVmValue,
    CdkActTryFromVmValue,
)]
struct User {
    ...
}

type PersonWhoUses = User

#[derive(
    serde :: Deserialize,
    Debug,
    candid :: CandidType,
    Clone,
    CdkActTryIntoVmValue,
    CdkActTryFromVmValue,
)]
struct CreateInlineUserReturnType {
    ...
}

#[ic_cdk_macros::update]
#[candid::candid_method(update)]
async fn create_user() -> User {
    ...
}

Azle Code

type User = {
    ...
}

type PersonWhoUses = User

$update;
export function create_user(): User {
    ...
}

$update;
export function create_person_who_uses(): PersonWhoUses {
    ...
}

$update;
export function create_inline_user(): {...} {
    ...
}

Kybra Code

class User(Record):
    ...

PersonWhoUses: TypeAlias = User

class CreateInlineUserReturnType(Record):
    ...

@update
def create_user() -> User:
    ...

@update
def create_person_who_uses() -> PersonWhoUses:
    ...

@update
def create_inline_uwer() -> CreateInlineUserReturnType:
    ...

What does the create_user node look like?
It has a return type that is a Record?
Or it has a return type that is a TypeAlias that has

ACT -> CanisterMethods -> UpdateMethods -> create_user

Option 1:

create_user
{
    params: vec![],
    return_type: Record{name: User, members: [...]}
}

Option 2:

create_user
{
    params: vec![],
    return_type: TypeAlias {name: User, aliased_type: Record: {members: [...]}}
}

Option 1:

create_person_who_uses
{
    params: vec![],
    return_type: TypeAlias {name: PersonWhoUses, aliased_type: Record{name: User, members: [...]}}
}

Option 2:

create_person_who_uses
{
    params: vec![],
    return_type: TypeAlias {name: PersonWhoUses, aliased_type: TypeAlias {name: User, aliased_type: Record: {name: User, members: [...]}}}
}

Option 1:

create_inline_user
{
    params: vec![],
    return_type: Record{name: CreateInlineUserReturnType, members: [...]}
}

Option 2:
N/A

We decided to move forward with option 1. Even though Azle looks like a type alias when you define a record or variant we are going to define it as a record

This would be good info for the readme
/// When you create a child declaration what are we trying to accomplish?
/// We want to flatten it and deduplicate it.
/// 1) Get full declaration of child
/// 2) Get self declaration of child
/// 3) Get grandchildren full declarations
/// 4) Flatten 2 and 3 into one map
/// 5) Repeat for all of the children, flattening that into one map.
/// For the case of the query method, we are going to get the params and return type and do the thing. for each of those
///

I looked into working with maps immutably. I couldn't find a way to work with an immutable map besides just making the map at compile time. So instead I am making some helper functions so that we can isolate all of the muts there and from the outside it still look immutable.

For impl<C, T> ToDeclaration<C> for Vec<T> we are doing things a little differently. Usually when getting the children declarations we get the declaration for the children and all of the flattened grandchildren declarations and combine them all into a single map. For Vec the Vec doens't really have it's own definition so instead it's just listing out the children declarations. So when getting the children declarations we only need to get the grandchildren.

We have two options. Make it conform with all of the other declarations, but have an odd thing where the to_code for Vec gives an empty quote. Or we leave it as is and add a comment explaining how this is different.

So for example, when we are declaring all of the query methods, we have a list of query methods. So to_code for that list gives us all of the query methods, and create_child_declarations gives us all of the types used in the params and return type.

Option 1 would make it so that create_code gives us nothing, and create_child_declarations gives us the functions and the types together. Option 2 would leave it as is

Edit:
Right now I am moving forward with option 1. Because vec doesn't really have a declaration, only it's children do. So it would make more sense for vec to take advantage of create_identifier and create_code returning an option and just return none and let the child_declarations do all of the work. It makes a lot of sense when you think about it and it make it more consistent and it let's us use the act::add_declaration_to_map which hides the fact we are using muts.

Edit:
As we are tightening down the what a proclamation is and moved to just getting inline declarations instead of all child declarations, I have decided that it doesn't make sense to use generic for them at all, and we have gotten rid of that everywhere.

Funcs need a bit of work. I need to understand the func struct and impls to finish it. I thought that since it was going to be just a principal and string that we wouldn't have to worry about the params. But I don't think that is true, because we are using the params in the func struct and impl thing. So we need to make sure that the names we settle on there will match up with everything.

I think that the is_type and as_type (eg is_array or as_primitive) was only needed back when we were making a global list of all of the types. We might not need it anymore. Probably after the kybra and azle refactor is done we can assess that and take action at that point.

It would make more sense for the act to have a list of children that it needs to define and then go define them all. Lets explore that a little bit.

How to avoid the stack overflow:
on the kybra or azle side you are going to need to maintain a list of all of the types you have run into on a specific tree. If you run into one that you have done before on that branch, then you need to use the Boxed data type instead and then go no further on that branch lest you create a stack overflow.

Edit: We aren't doing this anymore because we aren't doing the tree approach to parsing

This todo was found in fn_param.rs though it could apply to anything that has a name. This is asking about if we want to store the name as a string or as an ident.
Dan recalled "I think at one point we were debating whether we should store probably the param name as an ident or a string. There's times when we need both. So someone probably added that note to say, maybe we should make it easy to get it one way or the other"

My current feeling is that having it as a string would help us accomplish the goal of divorcing the nodes as much as possible from the underlying rust output.

Edit: I misunderstood, the todo is not suggesting that we have an ident instead of a string in the struct, but rather have a convenience function to help. Like have param.get_name_as_ident() or param.get_name_as_string().

This should only matter within the cdk framework I believe. Since the cdk shouldn't know about idents. So we can look at all the places we might use it and see if it's worth it.

act/node/canister_method/public_cansiter_methods/mod.rs
let function_name = self.get_name().to_identifier();
data_type/record/mod.rs
let record_ident = self.get_name(parental_prefix.clone()).to_identifier();
data_type/tuple/mod.rs
let tuple_ident = self.get_name(parental_prefix.clone()).to_identifier();
data_type/tuple/mod.rs
let variant_ident = self.get_name(parental_prefix.clone()).to_identifier();
all of the to_type_aliases as well

I kind of like the to_identifier() on String. That seems to be working pretty well. And I think adding another function to each of these would be a lot of bloat for not a lot of benefit.

The functionality for HasEnclosed type works type alias, but the name is not very accurate since it's not really enclosing anything. Maybe Has Enclosed type is useless and we can't really justify using it at all? Or maybe it's fine and we just need a better name, or maybe it's fine and we just shouldn't use it for TypeAlias

HasInnerType?
HasSubType?

We should go through all of the parental prefix strings and rename to match what we are expecting at that level instead of just matching what the trait is. For example at the external_canister_method level we want the name of the external canister, so instead of parental_prefix lets put, "canister_name"

But also I think even renaming it to parent_name a lone would help a lot. A "parental_prefix" is pretty vague. And what we want is the parent's name so that we can use that in creating a prefix. The caller shouldn't have to figure out what we mean by "parental prefix" they should just see "name" and be like "oh yeah I know the name to put in here"

The one place where this gets a little weird is parameters, or members where there could be inline types that would have the same parent name and possibly the same resolved name, so when you join those two together to create the identifier you have two of the exact same name even though they are different things. The solution is to have position info, like "param1" and "param2" but that is no longer the name but more like a prefix. And if we are determining the name for a record, for example, until we are parsing an actual record we don't know if it's a param that needs a number or if it's return type that doesn't need that positional info. But on that note, the return type would need something to differentiate from the params.

So maybe parental_prefix is the only thing that actually makes sense on all levels.

I wonder if changing it to just prefix would help? At first I thought that it would be even more vague but maybe that's a good thing. And by that I mean, if you have to give something a prefix, you can more easily imagine what that is, but what on earth is a parental_prefix?

I am hoping that as we start working in kybra and azle this will become more apparent if it's a problem or not

All inline types should have unique names, we shouldn't have to deduplicate them. In fact if we have two that have the same name it would be best to see right off the bat that we have two things with the same name instead of wonder why one of them randomly isn't showing up. Furthermore vecs are much easier to work with so we're going to try switching out the hashmaps for vecs

#28 (comment)

Furthermore tuple members and record or variant members are not all the different when it comes to the usage of the HasMember trait. They all use all parts and they use them all correctly. To make that more clear I have renamed the tuple elem to member as well, (See https://doc.rust-lang.org/rust-by-example/primitives/tuples.html which refers to tuple elems as members). So each of these members have similar things which are all in the HasMember trait and they all have different things which are all in their specific modules. To further clarify this I will also pull the member stuff out into their own files so they are eaiser to compare and contrast.

here is the order every file should be

  1. Any structs
  2. Any traits
  3. Any direct impls
  4. ToTypeAnnotation
  5. Proclaim impl
    1. create_declaration
    2. create_identifier
    3. collect_inline
  6. HasParam impl
  7. HasReturn impl
  8. HasMember impl
  9. HasEnclosedType impl
  10. Any other impls in alphabetical order?

Should just all of this be in alphabetical order? Maybe, lets start with everything being consistent and see how we feel about it.