petermr/CEVOpen

Elision of species names in composition table

Opened this issue · 6 comments

We have a minor gremlin in the way that the composition tables are put together. The extraction routine seems to be eliding species names with preceding text:

Table 1Chemical composition, concentrations (%) and calculated retention indices, ofT. boveiessential oil as characterized by GC/MS analysis
This is now becoming an issue because I have made a start on processing oil compositions using KNIME by mining the composition*.html files. I'd like to tag the records with species, plant parts etc. This is stopping me from doing it.

When you look at the accompanying summary.html, the names aren't elided at all:
image

Think I have fixed it. Arises in part from text like:
this isE. coli<./italic>a bacterium.
Have added extra spaces in.

Please check
https://github.com/petermr/CEVOpen/tree/master/searches/oil186

P.

I've just done a git pull and I think it's still an issue.
image
I was thinking that if perhaps you leave the tags in, I can get KNIME to strip them out instead.

@deadlyvices what folder are you pulling it to?

I have just done a
git clone https://github.com/petermr/CEVOpen.git to C:\Temp\oils
on my Azure VM (Server 2016) with No Issues

Remember Windows has path length restrictions of 256 characters in certain conditions.

NP just thought it was a similar issue poped up again.
I did a "dir /s /b > abc.txt" to get just the file names and their paths.
A quick and dirty progarm to scan them shows max length is (222 -13) = 209, therefore maximum length of folder name where your repo is cloned to is 47.

I did find 743 lines (files) where the characters were NOT 7 bit ASCII.
Does anyone think this might be an issue?
I can dump the list here if required.