cmaclell/concept_formation

Convert the underlying av_counts dictionary into numpy vectors

Opened this issue · 1 comments

The general idea is to store a set of attributes and a mapping of attribute values to vector indices (for nominal counts) and a mapping of attributes to vector indices (for numeric counts) in the root.

Then each instance is converted into a new object with the following:

  • nominal counts (a vector of zeros and ones)
  • numeric counts (a vector of numeric values or nas for missing)

The concepts will store:

  • nominal counts (a vector of counts for each attr-val)
  • three numeric vectors used for computing incremental mean and std
  • a vector of counts for each numeric attribute

Then incorporating an instance into a concept will be a simple vector addition (for nominals) and three operations for incrementally updating the numeric vectors, skipping those that are missing.

Merging concepts will also be a vector addition (for nominals) and something like 5 operations for incrementally merging the numeric vectors.

To make this work we might want to create a special instance class/object and maybe a function in the tree that takes an instance dict and returns a instance object that can be incorporated into the tree.

Then things like computing the expected correct guesses can be done with a simple vector dot product. If we keep everything in numpy arrays I expect we should see a HUGH performance gain.

An alternative idea is to add a new kind of feature that supports something like numpy arrays directly. Now that I'm thinking about it this might be the best way to do it.

For example, an instance might look like the following:

{'X': np.array([1,2,3,4]), '_y': 1}

Then, internally we could do the cobweb3 thing and maintain means and stds for each of the X variables, but this would give users the flexibility to take advantage of numpy arrays if they know their data has a fixed dimension.