API: GetDataFrameAsync returns integer values instead of corresponding character values
dskmh opened this issue · 3 comments
The following snippet returns the integer representation of x$v2 instead of the character values:
await session.ExecuteAsync("v1 <- c('10','20','20','30')");
await session.ExecuteAsync("v2 <- c('a','b','b','c')");
await session.ExecuteAsync("x <- data.frame(v1, v2)");
var df1 = await session.GetDataFrameAsync("x");
Adding "stringAsFactors = FALSE" is a workaround:
await session.ExecuteAsync("x <- data.frame(v1, v2, stringsAsFactors = FALSE)");
I expected it to display the factor's levels instead of the factor's values. Is this the expected behavior?
Without knowing exactly what GetDataFrameAsync
does, I would guess that yes, this is expected behaviour. By default, data.frame
converts character columns to factors, and a factor is an integer variable that happens to have a levels
attribute and a print
method that displays those levels. By setting stringsAsFactors=FALSE
, you disable this conversion.
When you retrieve the data frame with chars converted to factors, is all the factor info preserved? If so, then I'd say there's not a problem.
@Hong-Revo: The factor information is not preserved when retrieving the data frame. The integer values are returned but not the levels.
GetDataFrameAsync
returns V2 as character if I create the data frame using CreateDataFrameAsync
instead of creating it directly in R as above.
var rowNames = new string[] { "1", "2", "3", "4" };
var colNames = new string[] { "v1", "v2"};
var data = new object[] { new object[] { 10, 20, 20, 30 }, new object[] { "a", "b", "b", "c" } };
var list = new List<IReadOnlyCollection<object>>();
foreach (object o in data)
{
list.Add(o as object[]);
}
var original = new DataFrame(rowNames, colNames, list.AsReadOnly());
await session.CreateDataFrameAsync("x", original);
var rdf = await session.GetDataFrameAsync("x");
Using CreateDataFrame to create the data frame in .NET
str(x)
'data.frame': 4 obs. of 2 variables:
$ v1: num 10 20 20 30
$ v2: chr "a" "b" "b" "c"
Using ExecuteAsync to create the data frame directly in R
str(x)
'data.frame': 4 obs. of 2 variables:
$ v1: num 10 20 20 30
$ v2: Factor w/ 3 levels "a","b","c": 1 2 2 3