spdx/spdx-3-model

AI: hyperparameter, metric, metricDecisionThreshold - can be multiple. Property talks about only one.

VenkatTechnologist opened this issue · 11 comments

There could be multiple hyperparameters, metrics, and metricDecisionThresholds for an AI model.
The properties 'hyperparameter', 'metric' and 'metricDecisionThreshold' talk about only one value, and their class
is DictionaryEntry, which can hold only one key-pair. How can we accommodate multiple hyperparameters,
metrics, and metricDecisionThresholds of an AI model in the AI profile of SPDX?

Some real world, open source examples where multiple hyperparameters are used at once:

Machine Learning:

  • XGBoost (eXtreme Gradient Boosting) : n_estimators, learning_rate, max_depth, L1/L2 regularization parameters
  • TensorFlow/Keras Convolutional Neural Network (CNN): No. of convolution layers & filters, Kernel size, Pooling layer settings, Optimizer and learning rate
  • Scikit-learn Support Vector Machine (SVM): Regularization parameter C, Kernel, Gamma (for RBF Kernel)

Generative-AI:

  • StyleGAN2 - TensorFlow/Keras (Generative Adversarial Network): No. of layers & filters, Learning rates for generator and discriminator, Noise input dimension, Regularization hyperparameters)
  • BicycleGAN - PyTorch (Conditional GAN): Network architectures for generator & discriminator, Loss functions, Weight normalization
  • OpenAI Gym with TensorFlow/Keras (Reinforcement Learning for Generative Models): Reward function design, Exploration vs exploitation, RL algorithm hyperparameters

Metrics used to assess and their associated thresholds can be multiple in machine learning as well as generative AI algorithms.

Real world, Open source examples:

Generative AI:

StyleGAN2 - PyTorch (Generative Adversarial Networks): Fréchet Inception Distance (FID), Inception Score (IS), Human Evaluation
MelNet - TensorFlow (WaveNet-based Text-to-Speech): Mel-Cepstral Distortion (MCD, Log mel Spectrogram Similarity,
OpenAI GPT-2 (Large Language Model): Perplexity, BLEU

Machine Learning:

Scikit-learn Multi-Class Classification: Accuracy, Confusion Matrix, Precision and Recall, F1-Score
TensorFlow/Keras Object Detection: Mean Average Precision, Intersection over Union
XGBoost Regression with Feature Importance: Mean Squared Error, R-squared, Feature Importance

On that note, it looks like we need a class called 'Dictionary' to list multiple hyperparameters, metrics, and metric thresholds.

See #773.

bact commented

In AIPackage, all of them (hyperparameter, metric, metricDecisionThreshold) has minCount = 0 and no maxCount. Which means one AIPackage can have multiple of these properties (= multiple entries in an array)

bact commented

I think we can close this one, as it is very clear that the current model can accommodate the expressed concern.

bact commented

It will be through relationships like contains and dependsOn.

See possible relationship types here: https://spdx.github.io/spdx-spec/v3.0/model/Core/Vocabularies/RelationshipType/

bact commented

@goneall I think we can close this one as it is a non-issue: the cardinality of this property in AIPackage is 0..* -- @VenkatTechnologist has agreed on this.

Closing per above suggestion