Tripal Custom Tables and handling multiple schema

Question

Tripal Custom Tables and handling multiple schema

risharde opened this issue 3 years ago · 21 comments

I'm going to attempt to make Tripal Custom Tables compatible with different schema names. Hopefully this isn't going to be another deep rabbit hole - basically if I can get the tripal_custom_tables to store the schema name with the table name, this would make it more compatible with different schema names. Default would be public.

Answer 1 · 2021-11-01T00:18:14.000Z

Initial code works on the chadoInstaller and successfully adds the schema name to the table name in the tripal_custom_table. Tested on default 'chado'. Next test will be to create another schema (while also keeping the default chado schema) to ensure there is no clashes occuring with the code.

Answer 2 · 2021-11-01T00:21:02.000Z

Seems successful - as you can see - the table names now had the prepended schema name which will help to differentiate where the table actually exists.

Answer 3 · 2021-11-01T00:22:21.000Z

I'm going to run some tests on the Admin Tripal Custom Tables UI to ensure it still remains compatible with this new implementation

Answer 4 · 2021-11-01T00:48:06.000Z

I have discovered 2 issues I am working on:

While the tables are correctly listed in the UI - if I try to edit a table schema, it complains that there is not table attribute within the schema (these seems unrelated to the new code)
The chado form for dropping / removing a schema does not cater for the removal of custom table records in the public.tripal_custom_table (this was a lack of forsight in the first version and unrelated to the new code but will be fixed now as well)

Answer 5 · 2021-11-01T00:54:54.000Z

Issue 2 has been resolved (dropping + removing specific schema name custom tables from the tripal_custom_tables public table)

Answer 6 · 2021-11-01T00:58:47.000Z

Part of issue 1 has been resolved - now the table attribute is added to the chadoInstaller. Working on the UI edit form to make it compatible with editing (it seems the recreate checkbox is not validating correctly)

Answer 7 · 2021-11-01T01:45:07.000Z

Issue 1 has been resolved! Edit table schema now works and the checkbox for dropping and recreating works now.

Answer 8 · 2021-11-01T01:47:55.000Z

Confirming that the delete tripal custom table form works!

Answer 9 · 2021-11-01T03:46:34.000Z

New textfield element added to specify the target schema name and tested (works like a charm!)

Answer 10 · 2021-11-01T03:47:20.000Z

Reminder: We need to update schema lookup function to cater for new naming convention within the tripal_custom_table now that we are using the schemaname.table_name format

Answer 11 · 2021-11-02T02:21:05.000Z

Doing a search using VSCODE for custom tables, I came across the getTables($include_custom = FALSE) function within the ChadoSchema class. It seems this code was not properly retrieving table names (TODO inner if function). This has been resolved to respect the $include_custom boolean variable. Also altered this code to extract only the table_name and not the prepended chado schema name to remain compatible (will need to test)

Answer 12 · 2021-11-02T15:07:41.000Z

Ran the OBO loader for the Sequence Ontology overnight and today it looks like cvterms have been created. No errors displayed on the console during imports. So this initially looks like the modifications do not break the default OBO importer process.

Answer 13 · 2021-11-02T15:10:55.000Z

OBO loader took 78.60 minutes. Even after altering the VM to utilize 4 threads. (maybe at some point - some exploring of a WSU server for comparison) on the T4 branch. Again, this is unrelated to this particular task at hand.

Answer 14 · 2021-11-02T15:59:13.000Z

GFF3 loader errored out during chado_select_record on the tripal_gff_temp table. It seems the schema for this table is not being returned from the tripal_custom_tables column. A new commit has been created to resolve this and currently re-running the import of Citrus_sinensis-orange1.1g015632m.g.gff3

Answer 15 · 2021-11-02T16:37:14.000Z

GFF3 loader ran successfully this time. Going to test the Fasta Importer as per T3 generalized documentation (even though this is T4, this documentation is the same procedure)

Answer 16 · 2021-11-02T17:16:30.000Z

Experienced an issue where the Fasta file uploaded was erroring out (wrong file size). It was due to the /var/www/html/drupal/sites/default/files/tripal/users/1/temp <-- id 1 since I am the only user on this test environment. To solve, I deleted the files within this temp folder and the .fasta file in the parent folder.

Answer 17 · 2021-11-02T17:16:51.000Z

The Fasta import was successful and it found 1 sequence as per documentation

Answer 18 · 2021-11-03T20:46:10.000Z

@laceysanderson Discussing current implementation with @spficklin, hopefully I am wording it correctly, he said he prefers my new implementation to have the schema name as a seperate field. Currently (in my implementation) the tripal_custom_tables column is now appending the schema_name to the table_name example: chado.tripal_gff_temp . He wanted you to confirm based on your ideas of multi chado compatibility if having the schema name in a seperate field would be better. Also he mentioned that the table_name should just be the table_name in the literal sense. Please let me know? I'd like to close off this branch as soon as I can (if it's possible) to keep focus on other things.

Thanks much!

Answer 19 · 2021-11-03T23:24:33.000Z

I agree that the table_name should be the literal table name in the database :-)

Regarding the schema_name, I agree with not appending it to the table name. I do wonder if we want it as a single column though as that will mean we would need to re-create the custom table for each schema we want it to appear in... which seems error-prone and not great if updates are needed. You would also end up with the same table listed over and over in the UI, each with a different schema listed.

I think it would be ideal to be able to create a custom table and select all pre-existing schema you want it to be added to (so checkboxes). This could be accomplished with a JSON column in the tripal_custom_tables or a tripal_custom_table_schemas table with a 1 custom table to many schema foreign key relationship.

Answer 20 · 2021-11-10T14:48:14.000Z

Disclaimer: I didn't play yet with the custom table UI on t4d8/9 so read my point of view with that in mind. I may miss some points.

In my opinion, custom tables are part of a "single" Chado schema. They are defined to solve something in a given Chado data context (i.e my custom tables may not be useful to your Chado data). Of course, since you often deal with the same type of data in a same database, you will want your custom table in each of your Chado instances... but still, you may have a Chado schema that does not need a given custom table in your database. That's why I think custom tables should be considered instance-specific (ie. I don't like the schema checkbox approach for that reason as you may have a same custom table name in 2 different Chado instances with a different definition). I think the custom table list should be schema-specific (with a "schema name" filter) so each custom table would appear only once (for a selected schema). If you want to apply/create the same custom table across several schemas, the UI should allow you to do so (import/export feature? "apply to" button with a list of schemas with checkboxes this time ;) ?).

To go further, do custom tables need to be "recorded" in Tripal system? Wouldn't it be better such table would be automatically found by Tripal? (and cached if needed) Why not using functions in the BioDb layer that allows Tripal to find custom tables which are defined as not being part of the official Chado schema?

$c = \Drupal::service('tripal_chado.database');
$c->schema()->getTables(['custom']);

Currently, that code returns on my instance:

     "materialized_view" => [
       "name" => "materialized_view",
       "type" => "table",
       "status" => "custom",
     ],
   ]

Then, if one need to get a Drupal schema definition of that table for use in some place in a Drupal module:
$c->schema()->getTableDef('materialized_view', ['source'=>'database', 'format'=>'drupal']);
Which returns:

[
     "fields" => [
       "materialized_view_id" => [
         "type" => "serial",
         "not null" => true,
         "pgsql_type" => "serial",
         "size" => "medium",
       ],
       "last_update" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "timestamp without time zone",
       ],
       "refresh_time" => [
         "type" => "int",
         "not null" => false,
         "pgsql_type" => "integer",
         "size" => "medium",
       ],
       "name" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 64,
       ],
       "mv_schema" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 64,
       ],
       "mv_table" => [
         "type" => "varchar",
         "not null" => false,
         "pgsql_type" => "character varying",
         "length" => 128,
       ],
       "mv_specs" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "indexed" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "query" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
       "special_index" => [
         "type" => "text",
         "not null" => false,
         "pgsql_type" => "text",
       ],
     ],
     "unique keys" => [
       "materialized_view_name_key" => [
         "name",
       ],
     ],
   ]

So... why do we need to record custom tables in Tripal? What did I miss?

[EDIT] The code above works with the BioDb layer of the tripal-biodb branch https://github.com/tripal/t4d8/tree/tripal-biodb .

Answer 21 · 2022-03-01T01:13:44.000Z

i think we can close this in favour of biodb implementation