nomic-ai/deepscatter

Improve passing colormaps.

bmschmidt opened this issue · 2 comments

Currently the handling of colorscales for categorical variables sucks. @AndriyMulyar was asking about this on the slack more delicately than I'll put it here, but:

If you want to colorize by a categorical field called 'classification,' you have to do this:

encoding: {
   color: {
      field: "classification",
      range: "category10",
      domain: [-2047, 2047]
   }
}

There are a few problems here:

1). Oh my god domain: [-2047, 2047] is terrible. This reflects the underlying encoding of arrow dictionary codes. But there's no reason any user should care about this, because the actual dictionary codes are going to be [0, 4095] and I only cast it down for reasons involving half-precision floating point performance that should be completely hidden. It's probably best to let this be fully implicit; but it would be better to have a string shortcut or something. Also, It's not even right--there are 4097 integers in that range and so something around the 2048th will be duplicated.
2.) There's no defined way to access the scale client-side. Wanting to get at the scale for legends, etc. is a super-common use case.
3. The order of dictionary fields cannot be defined, because quadfeather may shuffle them around. So if you have five values called ['worst', 'bad', 'fine', 'good', 'better', 'best'] it's not possible to guarantee colors for them in that order.
4. If someone wants to create their own colormap, it's very hard to to do so. I think you can pass [r, g, b] values in; but it's actually fairly hard.

So a better solution would be to allow either of the following syntaxes.

encoding: {
   color: {
      field: "classification",
      range: "category10" 
    }
}

Where the fact that 'classification' is categorical implicitly triggers a domain over the range [-2047, 2046] in a way that never mentions those numbers.

encoding: {
   color: {
      field: "classification",
      range: ['#FFEEBB', '#226600', ...]
      domain: ['comedy', 'sci-fi', ....]
   }
}

@AndriyMulyar suggest passing a map or a d3-scale to the argument, but I'm inclined to only do it this way which is more grammar-of-graphics standard; it's pretty easy to cast to separate domain and range from either of those.

A couple other considerations.

  1. Currently it's possible to pass a range of colors as rgb triplets like [[255, 0, 0], [0, 0, 255]]. I'm going to stop supporting this, as it's confusing and weird; passing red as '#FF0000' is completely clear, and there's no reason to support another method except maybe allowing HTML colors ('red', 'aliceblue', etc.) as strings at some future point.
  2. I considered supporting passing instantiated d3-scale objects directly, but this creates some hiccups for linear scales, since d3 allows creating multiple breakpoints. (e.g., colorscale.domain(['red', 'white', 'blue']).range([0, 10, 20])). Since that's very hard to support for arbitrary length scales in webGL, I think it's best not to support it at all right now.

complete in v2.7.0