[Bug]: Sender error with tens of millions of data entries:Get data timeout, key=root:P2P-1:1->0
Closed this issue · 5 comments
Describe the bug
Sender Setup Stage
./main --config config/apsi_sender_setup_bucket.json
sender terminal
./main --config config/apsi_sender_online_bucket.json
log:
[2024-11-08 01:42:06.058] [info] [main.cc:44] SecretFlow PSI Library v0.5.0.dev241016 Copyright 2023 Ant Group Co., Ltd.
I1108 01:42:06.075622 2058268 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=5300.
W1108 01:42:06.075647 2058268 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210] Builtin services are disabled according to ServerOptions.has_builtin_services
[2024-11-08 01:42:06.075] [info] [entry.cc:455] Setting thread count to 152
INFO 01:42:06:077.216: ::apsi::PSIParams have false-positive probability 2^(-53.0384) per receiver item
[2024-11-08 01:42:06.077] [info] [group_db.cc:234] DB file /home/admin/dev/demo/data/apsi_sender_bucket//0_group.db already exists, load_meta /home/admin/dev/demo/data/apsi_sender_bucket//0_group.db.meta directly
DEBUG 01:42:06:077.517: Start loading SenderDB
DEBUG 01:42:06:078.270: Loaded SenderDB properties: item_count: 4977; label_byte_count: 1; nonce_byte_count: 16; compressed: false; stripped: true
DEBUG 01:42:06:085.592: Loaded BinBundle at bundle index 0 (511304 bytes)
DEBUG 01:42:06:086.217: Loaded BinBundle at bundle index 0 (956480 bytes)
DEBUG 01:42:06:086.277: Loaded SenderDB with 4977 items (1468248 bytes)
INFO 01:42:06:086.291: Start generating bin bundle caches
INFO 01:42:06:086.298: Finished generating bin bundle caches
DEBUG 01:42:06:086.308: Finished loading SenderDB
INFO 01:42:06:108.252: Loaded SenderDB (1468248 bytes)
INFO 01:42:06:108.354: Loaded OPRF key (32 bytes)
I1108 01:42:06.209764 2058271 4295006720 external/com_github_brpc_brpc/src/brpc/socket.cpp:2566] Checking Socket{id=0 addr=127.0.0.1:5400} (0x55a1daa1bc80)
receiver terminal
./main --config config/apsi_receiver_bucket.json
log:
[2024-11-08 01:42:17.487] [info] [main.cc:44] SecretFlow PSI Library v0.5.0.dev241016 Copyright 2023 Ant Group Co., Ltd.
I1108 01:42:17.504922 2058577 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=5400.
W1108 01:42:17.504944 2058577 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210] Builtin services are disabled according to ServerOptions.has_builtin_services
INFO 01:42:20:129.698: ::apsi::PSIParams have false-positive probability 2^(-53.0384) per receiver item
[2024-11-08 01:42:20.129] [info] [entry.cc:162] Setting thread count to 152
DEBUG 01:42:20:129.873: PSI parameters set to: item_params.felts_per_item: 5; table_params.table_size: 409; table_params.max_items_per_bin: 42; table_params.hash_func_count: 1; query_params.ps_low_degree: 0; query_params.query_powers: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42}; seal_params.poly_modulus_degree: 2048; seal_params.coeff_modulus: [48]; seal_params.plain_modulus: 65537
DEBUG 01:42:20:129.899: Derived parameters: item_bit_count_per_felt: 16; item_bit_count: 80; bins_per_bundle: 2045; bundle_idx_count: 1
DEBUG 01:42:20:131.354: Configured PowersDag with depth 0
[2024-11-08 01:42:20.132] [info] [csv_reader.cc:73] read file /home/admin/dev/demo/data/meituan_data_2500w.csv with header key, column_names: key
I1108 01:42:50.389800 2058649 4295006720 external/com_github_brpc_brpc/src/brpc/socket.cpp:2566] Checking Socket{id=0 addr=127.0.0.1:5300} (0x55dab4b99c40)
[2024-11-08 01:42:52.040] [info] [csv_reader.cc:162] Read csv file /home/admin/dev/demo/data/meituan_data_2500w.csv, row cnt is 25000000
[2024-11-08 01:43:04.444] [info] [entry.cc:205] Start deal with bucket 5068
[2024-11-08 01:43:04.444] [info] [entry.cc:210] Sending OPRF request for 2430 items
INFO 01:43:04:551.902: Created OPRFReceiver for 2430 items
INFO 01:43:04:551.963: Created OPRF request for 2430 items
DEBUG 01:43:04:551.968: Sending operation of type sop_oprf
[2024-11-08 01:43:04.582] [info] [channel.cc:362] send request failed and retry, retry_count=1, max_retry=3, interval_ms=1000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:05.583] [info] [channel.cc:362] send request failed and retry, retry_count=2, max_retry=3, interval_ms=3000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:08.583] [info] [channel.cc:362] send request failed and retry, retry_count=3, max_retry=3, interval_ms=5000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:13.583] [error] [channel.cc:104] SendImpl error [external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
result
Once the receiver starts, the sender fails, and the log is as follows:
I1108 01:42:18.211352 2058313 4295006724 external/com_github_brpc_brpc/src/brpc/socket.cpp:2626] Revived Socket{id=0 addr=127.0.0.1:5400} (0x55a1daa1bc80) (Connectable)
terminate called after throwing an instance of 'yacl::IoError'
what(): [external/yacl/yacl/link/transport/channel.cc:430] Get data timeout, key=root:P2P-1:1->0
Aborted (core dumped)
Steps To Reproduce
config/apsi_sender_setup_bucket.json
{
"apsi_sender_config": {
"threads": 1,
"log_level": "info",
"source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
"params_file": "/home/admin/dev/demo/data/100K-1-16.json",
"save_db_only": true,
"experimental_enable_bucketize": true,
"experimental_bucket_cnt": 10000,
"experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
"experimental_db_generating_process_num": 16,
"experimental_bucket_group_cnt": 512
}
}
config/apsi_sender_online_bucket.json
{
"apsi_sender_config": {
"source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
"params_file": "/home/admin/dev/demo/data/100K-1-16.json",
"experimental_enable_bucketize": true,
"experimental_bucket_cnt": 10000,
"experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
"experimental_db_generating_process_num": 16,
"experimental_bucket_group_cnt": 512
},
"link_config": {
"parties": [
{
"id": "sender",
"host": "127.0.0.1:5300"
},
{
"id": "receiver",
"host": "127.0.0.1:5400"
}
]
},
"self_link_party": "sender"
}
config/apsi_receiver_bucket.json
{
"apsi_receiver_config": {
"query_file": "/home/admin/dev/demo/data/meituan_data_2500w.csv",
"output_file": "/home/admin/dev/demo/data/batch_result.csv",
"params_file": "/home/admin/dev/demo/data/100K-1-16.json",
"experimental_enable_bucketize": true,
"experimental_bucket_cnt": 10000
},
"link_config": {
"parties": [
{
"id": "sender",
"host": "127.0.0.1:5300"
},
{
"id": "receiver",
"host": "127.0.0.1:5400"
}
]
},
"self_link_party": "receiver"
}
Expected behavior
The sender has a data volume of 50 million, consisting of keys and values, where the key is a hash value of a phone number and starts with any letter from A to K. The receiver has a data volume of 25 million, containing only keys. The expected result is that the receiver obtains the intersection of 25 million keys along with the corresponding values.
Version
v0.4.2b0
Operating system
Ubuntu 20.04
Hardware Resources
48C96G
The current engineering implementation of APSI is not perfect and can only meet the performance of a few queries for algorithm testing. We will define PIR-related interfaces and related optimizations in the future.
You can try adding the parameter recv_timeout_ms
in the link_config
and increasing its value. Reference:
psi/docs/reference/launch_config.md
Line 85 in c2f460e
The current engineering implementation of APSI is not perfect and can only meet the performance of a few queries for algorithm testing. We will define PIR-related interfaces and related optimizations in the future.
Received, thank you for the information provided. In the testing demo phase, for handling large data volumes, besides the recv_timeout_ms
parameter suggested by @tongke6, what other configurations are needed?
Looking forward to your response, thanks.
@huocun-ant
Using the parameters from https://github.com/secretflow/psi/blob/main/examples/pir/apsi/parameters/256M-4096.json for validation, it is currently possible to run with tens of millions of data entries, although the speed is relatively slow. The log will output the process.
There is another issue, the sender's original data volume is only 3.2G, but during the sender setup phase, the generated data directory is quite large, reaching 153G(apsi_sender_bucket directory).
{
"table_params": {
"hash_func_count": 3,
"table_size": 6144,
"max_items_per_bin": 4000
},
"item_params": {
"felts_per_item": 4
},
"query_params": {
"ps_low_degree": 310,
"query_powers": [ 1, 4, 10, 11, 28, 33, 78, 118, 143, 311, 1555]
},
"seal_params": {
"plain_modulus_bits": 26,
"poly_modulus_degree": 8192,
"coeff_modulus_bits": [ 50, 50, 50, 38, 30 ]
}
}
- add compress option
- config/apsi_sender_setup_bucket.json
{
"apsi_sender_config": {
"threads": 1,
"log_level": "info",
"compress": true,
"source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
"params_file": "/home/admin/dev/demo/data/100K-1-16.json",
"save_db_only": true,
"experimental_enable_bucketize": true,
"experimental_bucket_cnt": 10000,
"experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
"experimental_db_generating_process_num": 16,
"experimental_bucket_group_cnt": 512
}
}
- config/apsi_sender_online_bucket.json
{
"apsi_sender_config": {
"source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
"params_file": "/home/admin/dev/demo/data/100K-1-16.json",
"experimental_enable_bucketize": true,
"compress": true,
"experimental_bucket_cnt": 10000,
"experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
"experimental_db_generating_process_num": 16,
"experimental_bucket_group_cnt": 512
},
"link_config": {
"parties": [
{
"id": "sender",
"host": "127.0.0.1:5300"
},
{
"id": "receiver",
"host": "127.0.0.1:5400"
}
]
},
"self_link_party": "sender"
}
- You can reduce bucket num, but this may increase query time.
"params_file": "/home/admin/dev/demo/data/100K-1-16.json"
means your bucket size is 10k, and your query size is 1 row. Soexperimental_bucket_cnt
should be5000w / 100k = 500
, you can setexperimental_bucket_cnt
to 500. There is a issue, your query size is large, but100K-1-16.json
is optimized for 1 row query, therefore, the parameters contain optimization space.