pombase/website

ftp: README review

Opened this issue · 27 comments

  1. @kimrutherford add file names of new structure to this document
    https://docs.google.com/document/d/1TfvWngsI2U9-wkw2czxhHOZNmmQ8nxRwYa-TctrhMs0/edit

  2. @PCarme write READMEs (a lot of the info will be on the downlades website, do decide how much detail required and add referring URLs. If you don't know what it is tag me in the doc

  3. @ValWood / @kimrutherford to review fill in missing parts

  4. @PCarme to copy into place in Git

add file names of new structure to this document

I can do that but does it make sense to duplicate what is already in Git? Can't we edit the README text directly rather than copy into a Google doc and then back to the files in Git?

The READMEs are here:
https://github.com/pombase/pombase-scripts/tree/main/release_readme_files

There is one README file for each of the directories in the new structure:
https://www.pombase.org/public_releases/pombase-2024-06-01/

good point!

The READMEs are here:
https://github.com/pombase/pombase-scripts/tree/main/release_readme_files

Okay, thanks Kim ! I'll review the READMEs in there, and let you know when I'm done.

In https://github.com/pombase/pombase-scripts/blob/main/release_readme_files/exports_for_external_resources-README.txt, I have listed the files in the directory, but I don't really know what each of those corresponds to.

I have listed the files in the directory, but I don't really know what each of those corresponds to.

Thanks Pascal. I'll work on that one.

The "genome_sequence_and_features" directory contains several subdirectory. Should there be READMEs for all subdirectory, or a single README describing the content of all the subdirectories ?

The contents are quite diverse so I think each directory needs a README

Also, this file https://www.pombase.org/public_releases/pombase-2024-06-01/protein_features/transmembrane_domain_coords_and_seqs.tsv displays the entire sequence of each protein, not just the transmembrane domains sequences. Is it intended like that ?

It says coordinates and sequences, but it seems strange to put them together...

Maybe this wasn't a file for the public?

@kimrutherford ?

It says coordinates and sequences, but it seems strange to put them together...
Maybe this wasn't a file for the public?

This is all I can find about it:

This is all I can find about it:

I dug into my old email. This is from Snezhka. The thread is from April 2019, with the subject "transmembrane domains":


Hope everything is well - writing now to bug you with a question, sorry... Wonder if there is a way to, say, 'automatically' collect all transmembrane domains from all proteins. What I want to do is to compare the transmembrane domains (e.g. length distribution, unusual amino acids) in S. pombe to those in S. japonicus. Ideally so that I could do it separately for single spanners vs multispanners.


The file was created for Snezhka but it's updated nightly. Perhaps we don't need it in the new release directories?

Perhaps we don't need it in the new release directories?

agree, it's a bit random

agree, it's a bit random

OK, I've removed that file from the script that creates the new release directory structure.

The contents are quite diverse so I think each directory needs a README

I've added empty READMEs and checked that the script can process README files for sub-directories correctly.

There is a file with introns in CDS only (more important that we have these annotated), and one with CDS+UTRs
(we started adding UTR introns later, and we definitely don't have them all)

Oh right ! I hadn't thought about the UTR introns, it makes sense then. Thanks !

Also, this file isn't loaded properly https://www.pombase.org/public_releases/pombase-2024-06-01/genome_sequence_and_features/gff_format/Schizosaccharomyces_pombe_all_chromosomes_unstranded.gff3

I think that's OK. The file is empty because we don't have any unstranded features. Maybe we did have some years ago. I think it's best to remove it to prevent confusion.

I'm done writing the READMEs by the way.

I'm done writing the READMEs by the way.

Excellent. Thanks!

I haven't completed exports_for_external_resources-README.txt yet. Once I have, I'll make an example releases directory for 2024-09-01 so we can see if there is anything else needed.

Here's how the structure looks with the new READMEs and the latest release:
https://www.pombase.org/public_releases/pombase-2024-09-01/

We currently have the GPI/GPAD files for GO in this directory:
https://www.pombase.org/public_releases/pombase-2024-09-01/exports_for_external_resources/

Maybe they should be in the gene_ontology directory? It could be a sub-directory.

I've moved the allele_summaries.json file from exports_for_external_resources to the training_data_for_ML_and_AI directory since that's what is was created for (I think). There's nothing stopping us having files in more than one place so we could have a copy in exports_for_external_resources if it makes sense.

I haven't completed exports_for_external_resources-README.txt yet.

I've done that now:
https://www.pombase.org/public_releases/pombase-2024-09-01/exports_for_external_resources/PomBase_exports_for_external_resources_README.txt

As an experiment, the format is a bit different from the other READMEs. Let me know if you think it's better or worse.

Once I have, I'll make an example releases directory for 2024-09-01 so we can see if there is anything else needed.

I've done that too. Perhaps we can have a chat about it once we're all back from holiday.

https://www.pombase.org/public_releases/pombase-2024-09-01

I agree it makes sense to have the official GO release in the GO directory

I've moved the GPI/GPAD files into the gene_ontology directory.
https://www.pombase.org/public_releases/pombase-2024-09-01/