fcomitani/simpsom

Cannot locate raw_data or any detailed API.

Closed this issue · 4 comments

I've installed the latest SimpSOM using Pip and I've tried following the code presented on Github and the code presented in the API (readthedocs). Unfortunately, raw_data isn't present after import SimpSOM and I can't find any API documentation (e.g., descriptions, return values, argument values with datatypes) for each method/function. If raw_data is only a place holder, then what is it, i.e., a list, dictionary, array, etc? I would really like to learn more about this package, is there documentation elsewhere?

Hello @acnash,

my apologies for the lack of documentation.

raw_data is a placeholder, you are correct, the input should be a numpy array (samples as rows, features as columns), I've updated the readme to clarify that.

There was supposed to be an online API, but it seems readthedocs was not displaying it correctly (while still marking the build as passed). It should be fixed now, please find it at the following link.

The API and docs are still a bit lacking, I hope to find the time to tidy them up soon. In the meantime please let me know if anything else is unclear.

@fcomitani Thank you for looking into this.

I've had a look at the link you provided. Unfortunately, the link you provided is still only showing the subject headers without any body text eg:

API
Module contents
SimpSOM.densityPeak module
SimpSOM.hexagons module
SimpSOM.qualityThreshold module

I'm using the module as we speak and I'm hoping for some exciting results. In brief, I've trained my test network (still more data to come) and now I want to take each input vector from the training set and run them over the trained network placing a marker/indicator of where the best matching unit is. Once this has been performed per input vector, I want to identify 10 candidates from the input set as indicated on the network and identify any input vectors within a given distance (i.e., clustered) of the 10 candidates. I am looking at this from the perspective that I know the reason/purpose for the 10 candidate inputs and I use them to partition the trained network.

As I can't see any documentation at the moment (I appreciate you may have a very busy schedule), does your module have any methods/functions with this functionality in mind? Thanks!

Hi @acnash , that's puzzling.

Again my apologies for this, I'll work on it, but in the meantime it seems the master version of the docs is online, even if the formatting is messed up at the moment.

Regarding what you are planning to do, yes, that can easily be done with
net.project(vectors)

This function will project an input set of vectors (vectors an np.array) onto a trained map (net here is the name of your map object), and output the position of their best matching units. You can then use those positions for a clustering analysis.

For completeness, the library has a number of (basic) clustering tools already implemented, which can be run directly on the data to project with net.cluster(vectors, type=’qthresh’).
The type flag requires a string to be selected among quality threshold 'qthresh', density peak 'dpeak' and 'MeanShift', 'DBSCAN', 'KMeans' from scikit-learn.

I'm closing this for now. Let me know if you want it to be reopened.