lmu-bioinformatics/xmlpipedb

Where are the gdb's?

Closed this issue · 15 comments

I don't see the released gdb's on GitHub, unless I'm looking in the wrong place. What is our plan for migrating them from SourceForge and releasing new versions on GitHub? I also want to start storing the source UniProt XML, GO OBO-XML, and GOA files on GitHub so that we can reduce confusion as to versions of source files used to create the gdb's. We'll need a system for that as well.

dondi commented

I haven't put those up yet; I was not yet clear (or not remembering) if we had decided on an organizational scheme for those files, like whether they should be a separate repository, or just release as binaries, etc. We can talk this through when we meet.

OK. We'll leave the decisions until Wednesday, but for preliminary discussion, I think I want to keep them within the XMLPipeDB repository, unless there is a compelling reason to do otherwise. could we have a directory called "Gene Databases" or "gene databases" with subfolders for each species?

dondi commented

That is certainly a viable choice. We can talk about consequences for that compared to other options. Certainly no immediate dealbreakers come to mind.

dondi commented

One consequence of putting the files under version control: when committed, we will not need to integrate version or date information with the files. That way, a cohesive version history can be maintained. Instead, we would include renamed copies of these files as the downloadables of a given release.

@dondi is going to start this with Vibrio when he can get in his office to do the upload.

We will have a directory for each species with sub-directories "current" and "dates of released gdb's". For example the "current" directory will have a readme.md (in markdown), a zipped file with the gdb and readme.md, and a source.zip with the UniProt XML, GO OBO-XML, and GOA files and a readme.md for the source files. Once a gdb is vetted and "released" the "current" name will change to the gdb version date and there will be a new "current" folder for the next version.

We will close this issue once @dondi has committed Vibrio to the repository and @kdahlquist has confirmed.

dondi commented

OK, a pilot GenMAPP Gene Databases folder has been committed to the branch gene-db-pilot. I was able to put up MRSA and V. cholerae. Take a look at that branch and let me know how it looks to you. If it looks good, I can merge this into master.

Looks good, a couple tweaks: we should follow the naming convention we use for the folders for the source files, i.e., instead of calling it Vibrio_source_files_2015-08-10.zip, we should probably call it "V.cholerae_source_files_2015-08-10.zip". Also, should we hyphenate the date since we aren't in the folder names? Or should we hyphenate the folder names, too? Finally, any reason why the "current" folder is not at the top of the list, it comes before "V" in the alphabet?

dondi commented

I'm OK with removing the hyphens for the reason that you state: V.cholerae_source_files_20150810.zip then? Well it's easy to change names anyway, so I went ahead and did it for the source .zip; see how it looks now.

As for the "current" folder, maybe lowercase letters get sorted to the bottom? I haven't verified that, but that's the immediate thought.

I thought I should note that I am not going to retroactively change all the readme's to markdown, but will use it moving forward with newly exported gdb's.

Sorry, I didn't see your previous comment before I wrote my last. I like the rename. Can we try changing "current" to "Current" to see if that's the case. In my mind it seems that current should be at the top. But of course it won't be for Arabidopsis, for example. maybe "_current" or something to force it to the top?

dondi commented

Yes, "Current" would not be at a top for something like "Arabidopsis." Sort-wise, the underscore falls between uppercase and lowercase letters—Symbols that always precede capital "A" include "#", "%", "@".....let's try "@" because reinforces that we are "at" current.

OK, just committed and pushed. See how it looks now.

OK, not to get really picky, but maybe #current would be better since in GitHub, the @ symbol typically precedes users and "current" is more of a tag than a user...

dondi commented

Sure, good point. Try it now.

I'm good with it now. Close this after you merge the branches?

dondi commented

Yes, done :)