typedb/typedb-ml

Potential bug or unwanted behaviour of kglib categorical attribute range\embedder function

Opened this issue · 4 comments

Description

The issue that most probably originates in categorical attribute embedder of kglib library. I have defined a categorical variable that can take 3 values: 'duct_type': ["NotDuct","SLD","DC"]. However in some examples, both 'SLD' & 'DC' will be present in the graph. For these examples grakn produces an error like the one below. It does not show up for any other case or combination of values. After removing this variable from the list of variables and the query, the pipeline runs fine. It seems that the combination of two categorical labels is being treated as a unique value not in the specified range.

InvalidArgumentError: indices[0,0] = 5 is not in [0, 3)
     [[node KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embedding_lookup (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\sonnet\python\modules\embed.py:182) ]]

Errors may have originated from an input operation.
Input Source operations connected to node KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embedding_lookup:
 KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/Cast (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\kglib\kgcn\models\attribute.py:56)    
 ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embeddings/read (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\sonnet\python\modules\util.py:963)

Environment

OS (where Grakn server runs):Windows 10
Grakn version (and platform): Grakn Core 1.6.2
Grakn client: Python Client 1.6.1,
Other environment details: Workbase 1.2.7, grakn-kglib 0.2.1
It's been pointed out to me that you don't officially support Anaconda installations, but the same error is produces in VS Code, and all packages in my conda env were installed through pip.

Reproducible Steps

My kglib project is available at: https://github.com/Qbbz/SSP
with runtime instructions. Due to limited amount of time unfortunately I can't produce an exact example now, but I'm available to help you with that in the future.

Expected Output

Multiple categorical labels are treated separately and assigned an integer value within defined range OR the range needs to be defined in terms of possible combinations too.

Actual Output

The training wouldn't start due to error above: learn.py crashes at create_feed_dicts.

@jmsfltchr i think this should live in the kglib repo right?

Yes that's right! @QBBZ could you copy this issue over to graknlabs/kglib please?

Actually we can do it with the "transfer issue" these days :D on it

pardon, i didn't realise that i'm posting to the wrong branch. I'm happy it's solved now!