Jeddak provides a both academia- and industry-oriented platform for privacy computing and federated learning.
This is a competition-oriented lite version of Jeddak. Three guides for deploy, develop and use, respectively are provided below.
Deploy Guide
Jeddak provides two deployment modes: standalone and cluster,
where standalone mode is for fast experimental verifications
of new algorithms over a single host, and cluster mode supports production in
real multi-host applications. Note that the competition is conducted over the cluster mode.
Refer to doc/guide/quickstart.md for deployment guide.
Develop Guide
Jeddak provides standardized interfaces for developing your own
federated learning and privacy-preserving algorithms.
Refer to doc/guide/develop_guide.md for more details.
Use Guide
Algorithm List
Jeddak provides a series of developed privacy-preserving algorithms
as described in the following table. For this lite version, a limited number of such algorithms are mainly for the purpose of demonstration. Their configurations can be found
at example/conf/.
Algorithm Name
Classification
Description
data_loader
Preprocessing
Read data from various data sources
data_saver
Postprocessing
Save data to disk in various data structures
aligner
Preprocessing
Seek the intersection of the private sets held by multiple parties in a privacy-preserving fashion
glm
Federated Learning
A set of generalized linear models, including linear regression, logistic regression and poisson regression
dpgbdt
Federated Learning
Differentially Private Gradient Boosting Decision Tree
neural_network
Federated Learning
Deep Neural Network
evaluate
Postprocessing
Evaluate a federated learning model
model_loader
Postprocessing
Load model from local file / unload model from memory
predict_offline
Postprocessing
Offline prediction through specified model
Parameter List
data_loader parameters
Parameter
Type
Range
Default
Description
task_type
str
"data_loader"
"data_loader"
task type
task_role
str
{"guest", "host", "sole", "slack"}
"guest"
task role. "guest/host" means party's role in a task. "sole" means only this party carries out the task. "slack" means the party does nothing in the task
input_data_source
str
{"csv", "hdfs"}
"csv"
type of input data source. "csv" means local files and "hdfs" means a file path of Hadoop HDFS.
input_data_path
str
any strings
N/A
file path of input data which is valid and readable
train_data_path
str
any strings
N/A
file path of train data which is valid and readable, if not, will get from input_data_path.
validate_data_path
str
any strings
N/A
file path of validate data which is valid and readable
convert_sparse_to_index
bool
{true, false}
true
convert sparse features to natural numbers if true
data_saver parameters
Parameter
Type
Range
Default
Description
task_type
str
"data_saver"
"data_saver"
task type
task_role
str
{"guest", "host", "sole", "slack"}
"guest"
task role
output_data_source
str
{"csv"}
"csv"
type of output data source
aligner parameters
Parameter
Type
Range
Default
Description
task_type
str
"aligner"
"aligner"
task type
task_role
str
{"guest", "host"}
"guest"
task role
align_mode
str
{"diffie_hellman", "cm20", "dh_PSI", "tee"}
"cm20"
psi type
output_id_only
bool
{true, false}
true
output only id of each element in the intersection set
sync_intersection
bool
{true, false}
true
synchronizing the intersection set among all parties
key_size
int
{1024, 2048, 3072, 4096}
1024
cryptographic key length (in bits)
batch_num
int
{"auto"}, [1, inf)
"auto"
batch number for PSI in "cm20" mode, integer will be rounded up to power of 2
automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)), auto-disabled for continuous labels
train_validate_freq
int
[1, inf)
None
validation using validate data each train_validate_freq epoch if train_validate_freq is not None