tripal/t4d8

Tripal Custom Tables and handling multiple schema

risharde opened this issue · 21 comments

I'm going to attempt to make Tripal Custom Tables compatible with different schema names. Hopefully this isn't going to be another deep rabbit hole - basically if I can get the tripal_custom_tables to store the schema name with the table name, this would make it more compatible with different schema names. Default would be public.

Initial code works on the chadoInstaller and successfully adds the schema name to the table name in the tripal_custom_table. Tested on default 'chado'. Next test will be to create another schema (while also keeping the default chado schema) to ensure there is no clashes occuring with the code.

image
Seems successful - as you can see - the table names now had the prepended schema name which will help to differentiate where the table actually exists.

I'm going to run some tests on the Admin Tripal Custom Tables UI to ensure it still remains compatible with this new implementation

I have discovered 2 issues I am working on:

  1. While the tables are correctly listed in the UI - if I try to edit a table schema, it complains that there is not table attribute within the schema (these seems unrelated to the new code)
  2. The chado form for dropping / removing a schema does not cater for the removal of custom table records in the public.tripal_custom_table (this was a lack of forsight in the first version and unrelated to the new code but will be fixed now as well)

Issue 2 has been resolved (dropping + removing specific schema name custom tables from the tripal_custom_tables public table)

Part of issue 1 has been resolved - now the table attribute is added to the chadoInstaller. Working on the UI edit form to make it compatible with editing (it seems the recreate checkbox is not validating correctly)

Issue 1 has been resolved! Edit table schema now works and the checkbox for dropping and recreating works now.

Confirming that the delete tripal custom table form works!

New textfield element added to specify the target schema name and tested (works like a charm!)

Reminder: We need to update schema lookup function to cater for new naming convention within the tripal_custom_table now that we are using the schemaname.table_name format

Doing a search using VSCODE for custom tables, I came across the getTables($include_custom = FALSE) function within the ChadoSchema class. It seems this code was not properly retrieving table names (TODO inner if function). This has been resolved to respect the $include_custom boolean variable. Also altered this code to extract only the table_name and not the prepended chado schema name to remain compatible (will need to test)

Ran the OBO loader for the Sequence Ontology overnight and today it looks like cvterms have been created. No errors displayed on the console during imports. So this initially looks like the modifications do not break the default OBO importer process.

OBO loader took 78.60 minutes. Even after altering the VM to utilize 4 threads. (maybe at some point - some exploring of a WSU server for comparison) on the T4 branch. Again, this is unrelated to this particular task at hand.

GFF3 loader errored out during chado_select_record on the tripal_gff_temp table. It seems the schema for this table is not being returned from the tripal_custom_tables column. A new commit has been created to resolve this and currently re-running the import of Citrus_sinensis-orange1.1g015632m.g.gff3

GFF3 loader ran successfully this time. Going to test the Fasta Importer as per T3 generalized documentation (even though this is T4, this documentation is the same procedure)

Experienced an issue where the Fasta file uploaded was erroring out (wrong file size). It was due to the /var/www/html/drupal/sites/default/files/tripal/users/1/temp <-- id 1 since I am the only user on this test environment. To solve, I deleted the files within this temp folder and the .fasta file in the parent folder.

The Fasta import was successful and it found 1 sequence as per documentation

@laceysanderson Discussing current implementation with @spficklin, hopefully I am wording it correctly, he said he prefers my new implementation to have the schema name as a seperate field. Currently (in my implementation) the tripal_custom_tables column is now appending the schema_name to the table_name example: chado.tripal_gff_temp . He wanted you to confirm based on your ideas of multi chado compatibility if having the schema name in a seperate field would be better. Also he mentioned that the table_name should just be the table_name in the literal sense. Please let me know? I'd like to close off this branch as soon as I can (if it's possible) to keep focus on other things.

Thanks much!

I agree that the table_name should be the literal table name in the database :-)

Regarding the schema_name, I agree with not appending it to the table name. I do wonder if we want it as a single column though as that will mean we would need to re-create the custom table for each schema we want it to appear in... which seems error-prone and not great if updates are needed. You would also end up with the same table listed over and over in the UI, each with a different schema listed.

I think it would be ideal to be able to create a custom table and select all pre-existing schema you want it to be added to (so checkboxes). This could be accomplished with a JSON column in the tripal_custom_tables or a tripal_custom_table_schemas table with a 1 custom table to many schema foreign key relationship.

Disclaimer: I didn't play yet with the custom table UI on t4d8/9 so read my point of view with that in mind. I may miss some points.

In my opinion, custom tables are part of a "single" Chado schema. They are defined to solve something in a given Chado data context (i.e my custom tables may not be useful to your Chado data). Of course, since you often deal with the same type of data in a same database, you will want your custom table in each of your Chado instances... but still, you may have a Chado schema that does not need a given custom table in your database. That's why I think custom tables should be considered instance-specific (ie. I don't like the schema checkbox approach for that reason as you may have a same custom table name in 2 different Chado instances with a different definition). I think the custom table list should be schema-specific (with a "schema name" filter) so each custom table would appear only once (for a selected schema). If you want to apply/create the same custom table across several schemas, the UI should allow you to do so (import/export feature? "apply to" button with a list of schemas with checkboxes this time ;) ?).

To go further, do custom tables need to be "recorded" in Tripal system? Wouldn't it be better such table would be automatically found by Tripal? (and cached if needed) Why not using functions in the BioDb layer that allows Tripal to find custom tables which are defined as not being part of the official Chado schema?

$c = \Drupal::service('tripal_chado.database');
$c->schema()->getTables(['custom']);

Currently, that code returns on my instance:

     "materialized_view" => [
       "name" => "materialized_view",
       "type" => "table",
       "status" => "custom",
     ],
   ]

Then, if one need to get a Drupal schema definition of that table for use in some place in a Drupal module:
$c->schema()->getTableDef('materialized_view', ['source'=>'database', 'format'=>'drupal']);
Which returns:

[
     "fields" => [
       "materialized_view_id" => [
         "type" => "serial",
         "not null" => true,
         "pgsql_type" => "serial",
         "size" => "medium",
       ],
       "last_update" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "timestamp without time zone",
       ],
       "refresh_time" => [
         "type" => "int",
         "not null" => false,
         "pgsql_type" => "integer",
         "size" => "medium",
       ],
       "name" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 64,
       ],
       "mv_schema" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 64,
       ],
       "mv_table" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 128,
       ],
       "mv_specs" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "indexed" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "query" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "special_index" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
     ],
     "unique keys" => [
       "materialized_view_name_key" => [
         "name",
       ],
     ],
   ]

So... why do we need to record custom tables in Tripal? What did I miss?

[EDIT] The code above works with the BioDb layer of the tripal-biodb branch https://github.com/tripal/t4d8/tree/tripal-biodb .

i think we can close this in favour of biodb implementation