nfdi4plants/ARCTokenization

Feature Request - Enhanced Tokenization for Specific Folders and Files

Closed this issue · 0 comments

Description:
I would like to request a new feature for the GitHub repository that involves the enhancement of the tokenization tool to handle specific folders and files differently. The goal is to provide more flexibility and customization for handling top-level folders and their corresponding subfolders, as well as specific file types within those folders.

Features Requested:

  1. Folder Handling:

    • Top-level folders named "studies," "assays," "runs," and "workflows" should be treated differently during tokenization.
    • Subfolders directly beneath each top-level folder, named "study," "assay," "run," and "workflow," should also be handled differently.
  2. File Type Handling:

    • The tokenization tool should recognize and handle files with the following extensions differently:
      • CWL files (*.cwl)
      • IML files (*.iml)
      • ISA files related to the top-level folders mentioned above.

Expected Behavior:

  • Files within "study," "assay," "run," and "workflow" subfolders should be treated differently based on their file types.
  • ISA files within the top-level folders should also be handled differently during tokenization.

Rationale:
This feature will greatly benefit users working with structured data in the specified domain, allowing for more precise and customized tokenization based on the folder and file context.

Additional Notes:
Feel free to reach out if more information or clarification is needed. Thank you for considering this feature request.