json is not a drop-in replacement for simplejson
SaganBolliger opened this issue · 8 comments
In classification_network.py and several other places json
is imported if simplejson
fails to import. However, json
returns a dict with keys/values that are unicode strings, whereas simplejson
returns a dict with standard str key/value pairs. The nupic bindings fail when given unicode strings.
Either json
should not be included as a fallback for simplejson
, or additional configuration/conversion must be included so that the behavior matches.
Here is the stracktrace when I create a ClassificationModelHTM
using json
.
In [3]: cm = ClassificationModelHTM(json.load(open("projects/nlp/data/network_configs/tp_knn.json")),"projects/nlp/data/sample_reviews/sample_reviews.csv")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-9e5ff979105a> in <module>()
----> 1 cm = ClassificationModelHTM(json.load(open("projects/nlp/data/network_configs/tp_knn.json")),"projects/nlp/data/sample_reviews/sample_reviews.csv")
/Users/sbolliger/nupic.research/htmresearch/frameworks/nlp/classify_htm.pyc in __init__(self, networkConfig, inputFilePath, retinaScaling, retina, apiKey, verbosity, numLabels, modelDir, prepData, stripCats)
75 self.networkDataPath = inputFilePath
76
---> 77 self.network = self.initModel()
78 self.learningRegions = self._getLearningRegions()
79
/Users/sbolliger/nupic.research/htmresearch/frameworks/nlp/classify_htm.pyc in initModel(self)
114
115 # This encoder specifies the LanguageSensor output width.
--> 116 return configureNetwork(recordStream, self.networkConfig, encoder)
117
118
/Users/sbolliger/nupic.research/htmresearch/frameworks/classification/classification_network.py in configureNetwork(dataSource, networkParams, encoder)
245 _setScalarEncoderMinMax(networkParams, dataSource)
246
--> 247 network = createNetwork(dataSource, networkParams, encoder)
248
249 # Need to init the network before it can run.
/Users/sbolliger/nupic.research/htmresearch/frameworks/classification/classification_network.py in createNetwork(dataSource, networkConfig, encoder)
272 sensorRegionConfig,
273 dataSource,
--> 274 encoder)
275
276 # Keep track of the previous region name and width to validate and link the
/Users/sbolliger/nupic.research/htmresearch/frameworks/classification/classification_network.py in _createSensorRegion(network, regionConfig, dataSource, encoder, moduleName)
115 encoders = encoder
116
--> 117 _addRegisteredRegion(network, regionConfig, moduleName)
118
119 # getSelf() returns the actual region, instead of a region wrapper
/Users/sbolliger/nupic.research/htmresearch/frameworks/classification/classification_network.py in _addRegisteredRegion(network, regionConfig, moduleName)
149 # regionName = str(regionName)
150
--> 151 return network.addRegion(regionName, regionType, json.dumps(regionParams))
152
153
/Users/sbolliger/nupic/src/nupic/engine/__init__.pyc in addRegion(self, name, nodeType, nodeParams)
650 @doc:place_holder(Network.addRegion)
651 """
--> 652 engine.Network.addRegion(self, name, nodeType, nodeParams)
653 return self._getRegions()[name]
654
/Users/sbolliger/local/lib/python2.7/site-packages/nupic.bindings-0.2.2-py2.7.egg/nupic/bindings/engine_internal.pyc in addRegion(self, *args, **kwargs)
1084 def addRegion(self, *args, **kwargs):
1085 """addRegion(self, name, nodeType, nodeParams) -> Region"""
-> 1086 return _engine_internal.Network_addRegion(self, *args, **kwargs)
1087
1088 def addRegionFromBundle(self, *args, **kwargs):
TypeError: in method 'Network_addRegion', argument 2 of type 'std::string const &'
Can we just stick with one or the other? (I don't really care which one.) I don't want the complexity of having both.
I'm inclined to choose simplejson. Would this mean adding it to the nupic.research requirements.txt?
I'm inclined to choose simplejson. Would this mean adding it to the nupic.research requirements.txt?
Yes, that's fine. Not a big deal - you can just install it manually too (which is what I usually prefer).
Okay @SaganBolliger will you please change the imports to simplejson, add simplejson to the requirements, and test?
Yes, sure.
It really seems like a bug in bindings. It should be able to work with either type of strings. simplejson vs. json is too much of an implementation-dependent subtlety.
What do you think, @scottpurdy ?
@vitaly-krugl @scottpurdy Relevant information from the SWIG documentation:
At this time, SWIG provides limited support for Unicode and wide-character strings (the C wchar_t type). Some languages provide typemaps for wchar_t, but bear in mind these might not be portable across different operating systems. This is a delicate topic that is poorly understood by many programmers and not implemented in a consistent manner across languages. For those scripting languages that provide Unicode support, Unicode strings are often available in an 8-bit representation such as UTF-8 that can be mapped to the char * type (in which case the SWIG interface will probably work). If the program you are wrapping uses Unicode, there is no guarantee that Unicode characters in the target language will use the same internal representation (e.g., UCS-2 vs. UCS-4). You may need to write some special conversion functions.
If I recall correctly, python is a utf-8 shop, so
such as UTF-8 that can be mapped to the char * type (in which case the SWIG interface will probably work)"
should apply