UpstageAI/dataverse

Change convention using ___ (three underscore)

41ow1ives opened this issue · 1 comments

All the registered functions are using ___ convention and it seems a bit awkward.
It would be better to change convention using this.
Since all the functions can be seperated via their directory, its not that impossible thing to be done.

Thank you for addressing this issue. It was indeed one of my primary concerns, and I am looking forward to a more refined solution. Here's what I believe should be considered before we further discuss the potential revisions:

Initially, the function hierarchy in the system spans three levels, namely deduplication, common_crawl, and exact_line, which correspond to category, sub-category, and the actual function name, respectively. In the current setup within Dataverse, functions are stored in a registry as a list, necessitating unique names for each entry. This requirement led to my decision to adopt the ___ naming convention.

An attempt was made to eliminate the ___ and retain only the function name, yet the necessity to include category and sub-category information persisted. Introducing these as arguments to the register_etl function seemed to complicate matters unnecessarily, burdening users with the additional step of specifying categories and sub-categories with each registration. Attempts at automating the detection of function names were unsuccessful, given Dataverse's support for dynamic ETL, which allows users to create functions in any desired path. Consequently, I reverted to the original plan of employing the ___ naming convention.

Moreover, eliminating the hierarchical naming convention altogether could lead to confusion, as function names like exact_line might not adequately convey the function's purpose or context without additional category and sub-category information. Users would then be forced to refer back to the folder and file names to ascertain the function's exact role.

For reference, please consult the following link for more information on the current ETL process class naming convention within Dataverse: https://github.com/UpstageAI/dataverse/tree/main/dataverse/etl#-etl-process-class-naming-convention

Overall, the combination of ensuring unique identifiers within the function registry and maintaining clarity regarding each function's purpose underscores the necessity of our current naming approach. Further revisions will aim at simplifying this process without compromising on these essential aspects.