qri-io/starlib

dataframe committed to body does not retain column names

Closed this issue · 1 comments

With this code, I observed that the output dataset has field1, field2, etc in its structure when the source dataset and dataframe had defined column names.

load("http.star", "http")
load("encoding/csv.star", "csv")
load("dataframe.star", "dataframe")
ds = dataset.latest()
---
# get unique female names that start with V

# download CSV from the web, parse it and assign to a dataframe
csvDownloadUrl = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv?accessType=DOWNLOAD"
res = http.get(csvDownloadUrl).body()
foo = dataframe.parse_csv(res)

---


# filter for first names that start with 'V'
foo = foo[[x.startswith('V') for x in foo["Child's First Name"]]]

# filter for female
foo = foo[[x == 'FEMALE' for x in foo["Gender"]]]

# get unique
# namesOnly = foo["Child's First Name"]

# print(namesOnly.unique())

# foo = foo[foo[0]]

ds.body = foo


# print(ds.body)
dataset.commit(ds)

Should be fixed by qri-io/qri#1923. However, the new behavior has some known bugs, for example, column descriptions are not retained.