microsoft/RTVS

API: GetDataFrameAsync returns integer values instead of corresponding character values

dskmh opened this issue · 3 comments

dskmh commented

The following snippet returns the integer representation of x$v2 instead of the character values:

await session.ExecuteAsync("v1 <- c('10','20','20','30')");
await session.ExecuteAsync("v2 <- c('a','b','b','c')");
await session.ExecuteAsync("x <- data.frame(v1, v2)");
var df1 = await session.GetDataFrameAsync("x");

Adding "stringAsFactors = FALSE" is a workaround:

await session.ExecuteAsync("x <- data.frame(v1, v2, stringsAsFactors = FALSE)");

I expected it to display the factor's levels instead of the factor's values. Is this the expected behavior?

Without knowing exactly what GetDataFrameAsync does, I would guess that yes, this is expected behaviour. By default, data.frame converts character columns to factors, and a factor is an integer variable that happens to have a levels attribute and a print method that displays those levels. By setting stringsAsFactors=FALSE, you disable this conversion.

When you retrieve the data frame with chars converted to factors, is all the factor info preserved? If so, then I'd say there's not a problem.

dskmh commented

@Hong-Revo: The factor information is not preserved when retrieving the data frame. The integer values are returned but not the levels.

dskmh commented

GetDataFrameAsync returns V2 as character if I create the data frame using CreateDataFrameAsync instead of creating it directly in R as above.

            var rowNames = new string[] { "1", "2", "3", "4" };
            var colNames = new string[] { "v1", "v2"};
            var data = new object[] { new object[] { 10, 20, 20, 30 }, new object[] { "a", "b", "b", "c" } };
            var list = new List<IReadOnlyCollection<object>>();
            foreach (object o in data)
            {
                list.Add(o as object[]);
            }
            var original = new DataFrame(rowNames, colNames, list.AsReadOnly());
            await session.CreateDataFrameAsync("x", original);
            var rdf = await session.GetDataFrameAsync("x");

Using CreateDataFrame to create the data frame in .NET

str(x)
'data.frame': 4 obs. of 2 variables:
$ v1: num 10 20 20 30
$ v2: chr "a" "b" "b" "c"

Using ExecuteAsync to create the data frame directly in R

str(x)
'data.frame': 4 obs. of 2 variables:
$ v1: num 10 20 20 30
$ v2: Factor w/ 3 levels "a","b","c": 1 2 2 3