pabra/json-literal-typer

Does not handle escape sequences well

Closed this issue ยท 4 comments

I've been trying to make this work with escape sequences, and have pretty much run up my clock on the task. Consider wishing to supply a RegExp related value in JSON, like so:
{"timeStampRegex": "<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"}

Currently, analyze() removes the escape characters. I figured that supplying a "raw string to analyze" could be a good workaround, and open analyze up to accepting either an object (when escape sequences and/or special characters are not needed) or a raw string, in which case a desire for exact preservation is assumed.

I thought I might be able to get somewhere with String.raw`<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*`; and I think this is still an important element for the tests, but somehow, I think the values need to be double stringified: ie, input json string -> JSON.stringify -> JSON.parse to allow the eventual written value to maintain the exact string value supplied.

Would love to hear if you have any ideas about how to approach this problem.

If you'd like to look at a branch with some failing tests and an attempt, you can find one here: https://github.com/mscottnelson/json-literal-typer/tree/fork

pabra commented

One thing I saw is, in your test you passed rawString (a string) to analyze. But (I assume) you really wanted to pass in an Object.
I "verbosed" your test to hopefully make it easier to follow and ensure we do not talk past each other ๐Ÿ˜„

This way, the test succeeds:

describe('Handle JSON strings containing escape sequences', () => {
  // let's have a regex
  const timeStampRegex = /<\d+>(\w+ \d{2} \d{2}:\d{2}:\d{2}).*/;
  // let's store it as object in a JSON file
  const jsonFileContent = JSON.stringify({
    timeStampRegex: timeStampRegex.source,
  });

  // the raw JSON file content is now: {"timeStampRegex":"<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"}
  // to represent this as a JavaScript String, backslashes need to be escaped
  // so the json file content represented as JS string is: '{"timeStampRegex":"<\\\\d+>(\\\\w+ \\\\d{2} \\\\d{2}:\\\\d{2}:\\\\d{2}).*"}'
  // String.raw allows us to write "raw" strings without the need to escape backslashes
  // so the json file content represented as JS raw string is: String.raw`{"timeStampRegex":"<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"}`
  //
  // prove it:
  String.raw`{"timeStampRegex":"<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"}` === jsonFileContent; // -> true

  // let's test
  const rawJsonString = String.raw`{"timeStampRegex":"<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"}`;
  const parsedObj = JSON.parse(rawJsonString);
  const analyzed = analyze(parsedObj);
  const jsonified = jsonify(analyzed);
  const typified = typify(analyzed);

  it('should get expected json output', () => {
    expect(jsonified).toEqual({
      type: 'object',
      path: '$',
      keys: {
        timeStampRegex: {
          values: [
            {
              type: 'string',
              path: "$['timeStampRegex']{string}",
              values: [
                // can use JS string
                // '"<\\\\d+>(\\\\w+ \\\\d{2} \\\\d{2}:\\\\d{2}:\\\\d{2}).*"',
                // or raw string
                String.raw`"<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*"`,
              ],
            },
          ],
        },
      },
    });
  });

  it('should get expected typescript', () => {
    expect(typified).toEqual(
      'interface Root {\n  timeStampRegex: "<\\d+>(\\w+ \\d{2} \\d{2}:\\d{2}:\\d{2}).*";\n}',
    );
  });

  it('should produce compilable typescript', async () => {
    const compiled = await compile(
      typified,
      'json-string_with_escape_sequences',
      tsOptions,
    );
    expect(compiled).toBeTruthy();
  });
});

Passing a string was intended, as I providing myself a means to split between "raw JSON string" and "js object" on input. I did this in an attempt to a special pathway when "exact literals" is desired. But thinking on it I don't think that was a good idea anyway.

Anyway, if your input json file "sample.json" is (note that preserving the string as a string literal is desired):

{"timeStampRegex": "^<\\d+>([A-Z][a-z]+\\s+\\d+\\s+\\d+:\\d+:\\d+)"}

then:

const jsonString = readFileSync('../src/sample.json').toString();
const obj = JSON.parse(jsonString);
const typed = typify(analyze(obj));
console.log(inspect(jsonString));
// ๐Ÿ‘ '{\n  "timeStampRegex": "^<\\\\d+>([A-Z][a-z]+\\\\s+\\\\d+\\\\s+\\\\d+:\\\\d+:\\\\d+)"\n}'
console.log(inspect(obj));
// ๐Ÿ‘ { timeStampRegex: '^<\\d+>([A-Z][a-z]+\\s+\\d+\\s+\\d+:\\d+:\\d+)' }
console.log(inspect(typed));
// ๐Ÿ‘Ž 'interface Root {\n  timeStampRegex: "^<\\d+>([A-Z][a-z]+\\s+\\d+\\s+\\d+:\\d+:\\d+)";\n}'

console.log(typed);
/* Is not the actual string value that was in the original JSON:
interface Root {
  timeStampRegex: "^<\d+>([A-Z][a-z]+\s+\d+\s+\d+:\d+:\d+)";
}
*/

...using the same sample.json / produced js obj from my previous comment:

// copy of interface Root, above
interface TimeStampRegExp { timeStampRegex: "^<\d+>([A-Z][a-z]+\s+\d+\s+\d+:\d+:\d+)"; }

let myOther: TimeStampRegExp = { timeStampRegex: '^<d+>([A-Z][a-z]+s+d+s+d+:d+:d+)' }
// No error ๐Ÿ˜ข

//...but if we use an exact copy of the JS object that we originally passed to typify...

const obj: TimeStampRegExp = { timeStampRegex: '^<\\d+>([A-Z][a-z]+\\s+\\d+\\s+\\d+:\\d+:\\d+)' }
// Error:  Type '"^<\\d+>([A-Z][a-z]+\\s+\\d+\\s+\\d+:\\d+:\\d+)"' is not assignable to type '"^<d+>([A-Z][a-z]+s+d+s+d+:d+:d+)"'.ts(2322)