Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs

Question

Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs

fedarko opened this issue 6 years ago · 6 comments

Apparently vega treats these specially. See this page for context.

This is causing a problem with the rank names in the Byrd data example -- trying to switch to a rank that isn't "Intercept" brings up an error.

I guess we have to apply this not only for each column but for every possible string that's passed to Vega: so every feature/sample ID, augmented feature ID, sample metadata, and probably more. sheesh.

ideally we should have tests that verify that our measures to protect against Vega interpreting things wrongly work (#2).

Note: you can escape these either with a ton of backslashes or by enclosing the field names in square brackets. The latter sounds easier.

Note: related to vega/vega-lite#4965

Note: should also ensure that field names (when passed into the plot JSONs, e.g. for things like setting an encoding field of the sample plot's color or setting the encoding field of the rank plot's y-axis) are escaped in JS via something like vega.stringValue().

Answer 1 · 2019-03-07T03:43:38.000Z

So I think that due to our use of json.dump(), we shouldn't have to worry about most of these aside from the Vega-Lite-specific ones (periods and brackets). But again, it's still a good idea to be sure.

Answer 2 · 2019-03-07T04:18:13.000Z

If we want to be 100% safe, we'll need to escape all of the following:

Rank IDs
Feature IDs
Sample IDs
Sample Metadata IDs
Feature Metadata IDs

In practice, I'm not sure that this is necessary for feature metadata IDs, feature IDs, or sample IDs (since I've used .s in these IDs before without issue). I think json.dump takes care of those -- the main issue seems with fields that end up being set as an axis/encoding/etc in Vega/Vega-Lite (e.g. ranks).

still worth adding lots of test cases that verify that this all works as intended.

Answer 3 · 2019-03-07T04:45:12.000Z

ahsdfiusdoifjsdfioj

so it looks like even if you escape a rank ID properly for the axis stuff, you still need to use the non-escaped ID in the underlying dataset???? bluhg

Answer 4 · 2019-03-07T04:57:26.000Z

@mortonjt small question: is preserving the patsy formulas in rank IDs (e.g. C(Timepoint, Treatment('F'))[T.B] in the Byrd data) helpful when looking at the ranks? It looks like periods, brackets, and quotes all cause problems when you pass them into Vega-Lite as field IDs.

I've implemented a basic solution that converts periods to colons and square brackets to parentheses (along with filtering out quotes and backslashes). This takes care of the problem for now, but if you think it's worth it I can come back to this later (probably after exams are over) and add back in support for some of these weird characters.

Answer 5 · 2019-03-07T07:10:15.000Z

note to self: if we go with the solution of filtering out/converting certain special characters in IDs, ensure that they're still unique afterwards.

Answer 6 · 2019-03-07T14:04:33.000Z

The special characters don't matter, but the covariates (I. E. Timepoint), treatment (I. E. F) and control (I. E. B) are all important.

…

On Wed, Mar 6, 2019, 11:57 PM Marcus Fedarko ***@***.***> wrote: @mortonjt <https://github.com/mortonjt> small question: is preserving the patsy formulas in rank IDs (e.g. C(Timepoint, Treatment('F'))[T.B] in the Byrd data) helpful when looking at the ranks? It looks like periods, brackets, and quotes all cause problems when you pass them into Vega-Lite as field Is. I've implemented a basic solution that converts periods to colons and square brackets to parentheses (along with filtering out quotes and backslashes). This takes care of the problem for now, but if you think it's worth it I can come back to this later (probably after exams are over) and add back in support for some of these weird characters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#66 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD_a3QJSySl-eYD10nc7cWe7jmrgicejks5vUJw2gaJpZM4bYQGx> .