InseeFr/Trevas

substring

Opened this issue · 4 comments

@noahboerger you reported substr is not supported.

I thought it was, for scalars, components and datasets.

Could you report a failed example please?

Substring syntax (In Trevas param2 is the start_index, and param3 the length. The user test seems to assume, that param2 is the start_index and param3 the end_index (inclusive))

@noahboerger could you confirm substr doesn't cause exception throw but a wrong result because of the previous remark?

The first note on substr was just that it is currently not supported in the shape DS_r := substr( DS_1, 1, 2).

The different syntax also seems to be the correct one according to the reference manual (p. 76 f.).
So i would propose to close the issue as it is only an issue for our used testcases.

This syntax has to work, and is implemented in Trevas I think.
The condition on the handled dataset is: dataset with only string measures.

@noahboerger, can you provide a failed example with the stack trace please?

@NicoLaval you are right, with a dataset with only string measures it is working also on my side. The dataset i have tested it with had the wrong format

Id_1 Id_2 num_1 Text_1
IDENTIFIER IDENTIFIER MEASURE MEASURE
STRING STRING DOUBLE STRING

The concrete test case i have executed is the one from BdI "11. conditional/if-then-else_2", which is trying to execute it with a dataset that is also containing a measure with an additional double type. This testcase has failed with the following error:

Occured error

Exception

fr.insee.vtl.engine.exceptions.FunctionNotFoundException: function 'substr(Double, Long, Long)' not found
  at fr.insee.vtl.engine.visitors.expression.functions.GenericFunctionsVisitor.invokeFunction(GenericFunctionsVisitor.java:146)
  at fr.insee.vtl.engine.visitors.expression.functions.StringFunctionsVisitor.visitSubstrAtom(StringFunctionsVisitor.java:183)
  at fr.insee.vtl.engine.visitors.expression.functions.StringFunctionsVisitor.visitSubstrAtom(StringFunctionsVisitor.java:20)
  at fr.insee.vtl.parser.VtlParser$SubstrAtomContext.accept(VtlParser.java:3074)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitStringFunctions(ExpressionVisitor.java:254)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitStringFunctions(ExpressionVisitor.java:41)
  at fr.insee.vtl.parser.VtlParser$StringFunctionsContext.accept(VtlParser.java:1098)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
  at fr.insee.vtl.parser.VtlBaseVisitor.visitFunctionsExpression(VtlBaseVisitor.java:90)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitFunctionsExpression(ExpressionVisitor.java:491)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitFunctionsExpression(ExpressionVisitor.java:41)
  at fr.insee.vtl.parser.VtlParser$FunctionsExpressionContext.accept(VtlParser.java:629)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitAssignment(AssignmentVisitor.java:51)
  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitTemporaryAssignment(AssignmentVisitor.java:59)
  at fr.insee.vtl.parser.VtlParser$TemporaryAssignmentContext.accept(VtlParser.java:372)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at fr.insee.vtl.engine.VtlScriptEngine.evalStream(VtlScriptEngine.java:263)
  at fr.insee.vtl.engine.VtlScriptEngine.eval(VtlScriptEngine.java:282)
  at java.scripting/javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:262)
  at fr.insee.trevas.jupyter.VtlKernel.eval(VtlKernel.java:305)
  at io.github.spencerpark.jupyter.kernel.BaseKernel.handleExecuteRequest(BaseKernel.java:334)
  at io.github.spencerpark.jupyter.channels.ShellChannel.lambda$bind$0(ShellChannel.java:64)
  at io.github.spencerpark.jupyter.channels.Loop.lambda$new$0(Loop.java:21)
  at io.github.spencerpark.jupyter.channels.Loop.run(Loop.java:78)

The test from BdI assumes that the additional non string measure is kept without being manipulated. But I think your implementation is right, as the table of VTL-ML-Operators in the reference manual (p. 17), requires the input to be in the following form:

 op ::
dataset { measure<string> _+ }
| component<string>
| string
pattern1, pattern2 ::
component<string>
| string

so i think it the correct behaviour to fail here.

I did not dive so deep into this test case when i executed it the first time, but then it is a wrong test case. I am not sure if the trevas error message could be a bit more clear, as the engine still tries to execute the replace-operator even with a mismatching input dataset.