Program gets killed during compilation time
Closed this issue · 23 comments
I am using the machine based on "malicious-shamir-party reading" and trying to read a file with approx 100M numbers. After reading each element I am storing in an array of sint type. Further for a query, I am searching the count of elements equal to search query using a==b command. This setup work up to 1k element but after that program gets killed.
[1]+ Killed Scripts/compile-run.py -E mal-shamir python_test -- -N 3
I am using docker images to run the system. Could you please help?
The underlying code:
def read_numbers_from_file(filename):
with open(filename, 'r') as file:
num_rows = 1000
numbers = Array(num_rows, sint)
count = 0
while count < num_rows:
line = file.readline()
if line:
numbers[count] = sint(int(line.strip()))
count += 1 # Increment the counter
else:
break
return numbers
filename = 'data_1M.csv'
numbers = read_numbers_from_file(filename)
search_query=sint(1)
count = sint(0)
length=len(numbers)
def equality_test():
for i in range(len(numbers)):
count.iadd(numbers[i] == search_query)
equality_test()
print_ln('Count for %s is %s',search_query.reveal(), count.reveal())
I even tried the below version of the code:
program.use_edabit(True)
def read_numbers():
numbers=sint.get_input_from(0, size=1000000)
b = sint.get_input_from(1)
eq_bits = [x == b for x in numbers]
b_in_a = sum(eq_bits)
print_ln("Count of %s is %s",b.reveal(), b_in_a.reveal())
read_numbers()
Party 0 has 1M numbers in Input_P0_0 file under Player_Data folder
Party 1 has 1 number in Input_P1_0 file under Player_Data folder
This is because of the unrolling of large loops. In the second example, you should be able to fix this using
eq_bits = numbers == b
b_in_a = eq_bits.sum()
This switches to vectorized computation.
The first example is more difficult because it involves hard-coding all inputs in the program. I therefore recommend you stick to the second approach.
Thank you for quick response.
When making the code changes as asked, I am getting below error:
/usr/src/MP-SPDZ/Scripts/run-common.sh: line 90: 2363 Killed $my_prefix $SPDZROOT/$bin $i $params 2>&1
2364 Done | { if test "$BENCH"; then
if test $i = $front_player; then
tee -a $log;
else
cat >> $log;
fi;
else
if test $i = $front_player; then
tee $log;
else
cat > $log;
fi;
fi; }
My code:
import time
program.use_edabit(True)
def read_numbers_from_file():
numbers = sint.get_input_from(0, size=1000000)
b = sint.get_input_from(1)
eq_bits = numbers == b
b_in_a = eq_bits.sum()
print_ln("Count of %s is %s", b.reveal(), b_in_a.reveal())
start_time = time.time()
read_numbers_from_file()
end_time = time.time()
What is the output of Scripts/memory-usage.py <program-with-args>
? It might be that the memory requirements exceed your RAM.
I get time as printed as "Execution time:0.22 ms". However, the verbose of the run provides online/offline time. Which one should one consider to understand the time taken to run the code under MPSPDZ setting?
You cannot use the timing in the Python code because it doesn't measure the actual execution.
Also, MP-SPDZ considers only trusted client settings? Or Unstrusted client setting is also there?
I understand since I am using malicious majority honest setting based on Shamir's secret-sharing. if the number of parties is 3 parties, at least two have to be honest. But what about the clients who are communicating with these parties to do some computation? Is there any setting available for trusted or untrusted clients?
The standard client interface assumes an untrusted client, which means that it cannot influence the computation beyond the input provided and it does not learn more than the designated output.
A follow up question, so the online time is the time to consider the end-to-end latency?
"Spent 0.96727 seconds (84.4956 MB, 22 rounds) on the online phase and 8.87888 seconds (396.995 MB, 2460 rounds) on the preprocessing/offline phase." --> Meaning 0.96727 seconds is the end-to-end time.
If not, how to get the total execution time per party?
It depends on what you mean by end-to-end. The line you mention splits up the time in several phases whereas "Time = ..." indicates the wall clock time for the computation, excluding some setup costs.
It depends on what you mean by end-to-end. The line you mention splits up the time in several phases whereas "Time = ..." indicates the wall clock time for the computation, excluding some setup costs.
For a problem statement as " give the count of a value "b" in array of "number", the code will look like below:
def read_numbers():
numbers=sint.get_input_from(0, size=1000000)
b = sint.get_input_from(1)
eq_bits = [x == b for x in numbers]
b_in_a = sum(eq_bits)
print_ln("Count of %s is %s",b.reveal(), b_in_a.reveal())
read_numbers()
I want to know the following details:
- The time it took to return the "count result" --> Basically, the time duration between the client sent "b" until the time the client received the "count" result
- In how many rounds of communication between party-party/party-client the operation got completed
- The network bandwidth incurred
When I run MP_SPDZ, I get verbose as "Spent 0.96727 seconds (84.4956 MB, 22 rounds) on the online phase and 8.87888 seconds (396.995 MB, 2460 rounds) on the preprocessing/offline phase.". Does this mean that:
- The time it took to return the "count result" --> 0.96727 seconds
- In how many rounds of communication between party-party/party-client the operation got completed --> 22
- The network bandwidth incurred --> 84.4956 MB
If not, please guide the right way to compute points 1, 2,3 .
This depends on whether you want to include the preprocessing phase. The values in your list are for the online phase only, that is, the data-dependent phase without computation that can be done without knowing the data, other than the size.
How can I do a dot product between a matrix of size 5000 * 100K with vector of size 5000. I am able to do uptil matrix of size 5000 * 100, but beyond that the process gets killed.
- Is there a limit to which dot product can happen?
- What should be my system configuration to support it? --> Currently i have 64GB RAM and 1TB SSD with M1 Max Macbok
- If there something specific I need to change from code perspective to support this kind of operation?``
There must be a limit because not machine has unlimited storage. The concrete restrictions depend much on the way how you are using the framework, so it's hard to judge without knowing the code and the protocol.
There must be a limit because not machine has unlimited storage. The concrete restrictions depend much on the way how you are using the framework, so it's hard to judge without knowing the code and the protocol.
keyword_count = 5000
inverted_columns = 110000
with open("/usr/src/MP-SPDZ/inverted.txt", 'r') as file:
lines1 = [[int(value) for value in line.strip().split(',')] for line in file]
# writing shares of inverted_index and keyword vector
for i in range(0,inverted_columns):
temp=sint(lines1[i])
sint.write_to_file(temp) # append to file
Q. Even the creation of shares gets killed for such a large size. Is there a way I could load a 2D array of such a size and then perform Matrix-vector dot product.
My completed code of dot product:
# inverted index
keyword_count = 5000
inverted_columns = 110000
with open("/usr/src/MP-SPDZ/inverted.txt", 'r') as file:
lines1 = [[int(value) for value in line.strip().split(',')] for line in file]
# writing shares of inverted_index and keyword vector
for i in range(0,inverted_columns):
temp=sint(lines1[i])
sint.write_to_file(temp) # append to file
temp = [0] * keyword_count
position_cleartext = 0
temp[position_cleartext] = 1
file_vector = sint(temp)
sint.write_to_file(file_vector)
# reading shares of inverted_index and keyword vector
inverted_index=[]
for i in range(0,inverted_columns):
_, temp=sint.read_from_file(i*keyword_count, keyword_count)
inverted_index.append(temp)
# print_ln("Inverted index size %s * %s", len(inverted_index), len(inverted_index[0]))
inverted_index = Matrix.create_from(inverted_index)
print_ln("Inverted index size %s * %s", len(inverted_index), len(inverted_index[0]))
_, temp = sint.read_from_file(inverted_columns*keyword_count, keyword_count)
file_vector = Array.create_from(temp)
# phase 2 operation
for i in range(0,inverted_columns):
result =file_vector.dot(inverted_index[i])
# print_ln(" %s", result.reveal())
The above code gets killed.
Logs:
Q. Also can you please tell, who sent want amount of data to whom as per below stats:
Party 0:
Spent 0.231426 seconds (20.3201 MB, 8 rounds) on the online phase and 2.19302 seconds (98.0889 MB, 612 rounds) on the preprocessing/offline phase.
Communication details:
Broadcasting 0.0044 MB in 169 rounds, taking 0.0242042 seconds
Partial broadcasting 37.04 MB in 175 rounds, taking 0.068515 seconds
CPU time = 2.40999
The following benchmarks are including preprocessing (offline phase).
Time = 2.42639 seconds
Data sent = 118.409 MB in ~620 rounds (party 0 only)
Global data sent = 355.067 MB (all parties)
Party 1:
Spent 0.281857 seconds (20.3201 MB, 8 rounds) on the online phase and 2.14259 seconds (98.0889 MB, 612 rounds) on the preprocessing/offline phase.
Communication details:
Broadcasting 0.0044 MB in 169 rounds, taking 0.0271209 seconds
Partial broadcasting 37.04 MB in 175 rounds, taking 0.0709618 seconds
CPU time = 2.47621
The following benchmarks are including preprocessing (offline phase).
Time = 2.42652 seconds
Data sent = 118.409 MB in ~620 rounds (party 1 only)
Global data sent = 355.067 MB (all parties)
Party 2:
Spent 0.239738 seconds (20.3201 MB, 8 rounds) on the online phase and 2.18467 seconds (97.9289 MB, 612 rounds) on the preprocessing/offline phase.
Communication details:
Broadcasting 0.0044 MB in 169 rounds, taking 0.0238763 seconds
Partial broadcasting 37.04 MB in 175 rounds, taking 0.0699712 seconds
CPU time = 2.40435
The following benchmarks are including preprocessing (offline phase).
Time = 2.42636 seconds
Data sent = 118.249 MB in ~620 rounds (party 2 only)
Global data sent = 355.067 MB (all parties)
Q. And what round refers to --> 1> A to B and then B to A or 2> A To B, B to C, C to A or 3.> A to B?
Q. Or can you please breakdown above stats of any one party to what broadcast/partial broadcast etc, their round mean between which parties?
Q. As per my understanding, as I am using malicious-shamir-party [honest majority with malicious server using Shamir secret sharing] with 3 server, if round is 1 means in that single round party1 sends to party2, party2 sends to party1, party2 sends to party3, party3 sends to party2, party3 sends to party1, party1 sends to party3. The data sent to each party is given by "data sent", thus each party send 118.249 MB to other two parties and the time spent by each party for online + offline phase is give by "Time" as 2.42636 seconds
Q. Even the creation of shares gets killed for such a large size. Is there a way I could load a 2D array of such a size and then perform Matrix-vector dot product.
The overload probably happens in the loading procedure, use sint.input_tensor_via()
instead of piece-wise loading: https://mp-spdz.readthedocs.io/en/latest/Compiler.html#Compiler.types.sint.input_tensor_via
Q. Also can you please tell, who sent want amount of data to whom as per below stats: Party 0: Spent 0.231426 seconds (20.3201 MB, 8 rounds) on the online phase and 2.19302 seconds (98.0889 MB, 612 rounds) on the preprocessing/offline phase. Communication details: Broadcasting 0.0044 MB in 169 rounds, taking 0.0242042 seconds Partial broadcasting 37.04 MB in 175 rounds, taking 0.068515 seconds CPU time = 2.40999 The following benchmarks are including preprocessing (offline phase). Time = 2.42639 seconds Data sent = 118.409 MB in ~620 rounds (party 0 only) Global data sent = 355.067 MB (all parties)
There is no accounting for how much data is sent to a particular party.
Q. And what round refers to --> 1> A to B and then B to A or 2> A To B, B to C, C to A or 3.> A to B?
Rounds refer to the number of invocations of the communication operation in question, so one broadcast would account for one round.
Q. Or can you please breakdown above stats of any one party to what broadcast/partial broadcast etc, their round mean between which parties?
A (full) broadcast means every party sending something to every other party while a partial broadcast means some parties sending to some other parties in parallel.
Q. As per my understanding, as I am using malicious-shamir-party [honest majority with malicious server using Shamir secret sharing] with 3 server, if round is 1 means in that single round party1 sends to party2, party2 sends to party1, party2 sends to party3, party3 sends to party2, party3 sends to party1, party1 sends to party3. The data sent to each party is given by "data sent", thus each party send 118.249 MB to other two parties and the time spent by each party for online + offline phase is give by "Time" as 2.42636 seconds
Correct. The 118 MB is the data sent in total to all other parties by the party in question.
I ran the below code snippet:
keyword_count = 5000
inverted_columns = 110000
with open("/usr/src/MP-SPDZ/inverted.txt", 'r') as file:
lines1 = [[int(value) for value in line.strip().split(',')] for line in file]
sint.input_tensor_via(0, lines1)
Can you help on this?
Also, the code snippet below:
# reading keywords and access and query
keyword_count = 5000
_, keywords = sint.read_from_file(0, keyword_count)
keywords = Array.create_from(keywords)
print_ln("Keyword size %s", len(keywords))
# for i in range(0,10):
# print_ln("value %s", keywords[i].reveal())
_, query = sint.read_from_file(2 * keyword_count, 1)
query = query[0]
# print_ln("Query is %s", query.reveal())
# print_ln("Query is %s", query.reveal_to(1))
The logs were:
Time = 4.83777 seconds
Data sent = 236.978 MB in ~1238 rounds (party 0 only)
Global data sent = 711.093 MB (all parties)
Q. The data is merely 5000*64 bytes = 320KB, why data set is 236.978 MB???
And Thank you very much for being so prompt. Really appreciate this!
Can you help on this?
How much RAM is there, and what is the size of the input file?
Q. The data is merely 5000*64 bytes = 320KB, why data set is 236.978 MB???
MPC is based on communication, so the data sent throughout is expected to be much more than the input data.
Can you help on this?
How much RAM is there, and what is the size of the input file?
Answer: RAM is 64GB and input file [sint.read_from_file(0, keyword_count)] size is 8MB
Q. The data is merely 5000*64 bytes = 320KB, why data set is 236.978 MB???
MPC is based on communication, so the data sent throughout is expected to be much more than the input data.
Answer: as compared to the input file size (8MB), the cost per party if ~30 times. Can you give an idea as per the operations mentioned in the code, why the communication overhead is ~30 times?
How much RAM is there, and what is the size of the input file?
Answer: RAM is 64GB and input file [sint.read_from_file(0, keyword_count)] size is 8MB
It seems that storing data in lists of lists is not very space-efficient. You could try using numpy, which is supported by input_tensor_via
.
Q. The data is merely 5000*64 bytes = 320KB, why data set is 236.978 MB???
MPC is based on communication, so the data sent throughout is expected to be much more than the input data.
Answer: as compared to the input file size (8MB), the cost per party if ~30 times. Can you give an idea as per the operations mentioned in the code, why the communication overhead is ~30 times?
Equality testing is relatively expensive in MPC. There is not fixed conversion, but the following paper should give an idea on how it is computed: https://www.researchgate.net/profile/Sebastiaan-Hoogh/publication/225092133_Improved_Primitives_for_Secure_Multiparty_Integer_Computation/links/0c960533585ad99868000000/Improved-Primitives-for-Secure-Multiparty-Integer-Computation.pdf
Just to confirm, in "malicious-shamir-party [malicious-shamir-party.x]," servers are malicious, and secrets are shared among these malicious servers using Shamir Secret sharing in an honest majority setting?
Also, the matrix dot product is performed using beaver triples, for which initialization happens in the preprocessing phase, needing several rounds.
Yes, the security model is one maliciously corrupted party out of three, and all multiplications are performed element-wise using Beaver triples, which are generated in the preprocessing phase.
Yes, the security model is one maliciously corrupted party out of three, and all multiplications are performed element-wise using Beaver triples, which are generated in the preprocessing phase.
Shamir secret sharing is used for secret sharing in malicious-shamir-party setup ?????
Yes
Why is the communication cost for the below code so high?
stop_timer(0)
start_timer(1)
keyword_count = 5000
inverted_columns = 1000
inverted_index=sint.Matrix(keyword_count,inverted_columns)
print_ln("Inverted index size %s * %s", len(inverted_index), len(inverted_index[0]))
temp = [0] * keyword_count
position_cleartext = 0
temp[position_cleartext] = 1
file_vector = Array.create_from(sint(temp))
stop_timer(1)
start_timer(2)
# phase 2 operation
result = file_vector.dot(inverted_index)
print_ln("Inverted index size %s * %s", len(result), len(result[0]))
for i in range(inverted_columns):
print_ln(" %s", result[0][i].reveal())
stop_timer(2)
Logs:
Spent 4.12594 seconds (320.032 MB, 3 rounds) on the online phase and 13.6928 seconds (600.052 MB, 3002 rounds) on the preprocessing/offline phase.
Communication details:
Broadcasting 0.026032 MB in 1001 rounds, taking 0.141918 seconds
Partial broadcasting 320.016 MB in 1002 rounds, taking 1.14194 seconds
CPU time = 18.0844
The following benchmarks are including preprocessing (offline phase).
Time = 5.3041e-05 seconds
Time1 = 3.4875e-05 seconds (0 MB, 0 rounds)
Time2 = 17.8184 seconds (920.084 MB, 3004 rounds)
Data sent = 920.084 MB in ~3005 rounds (party 0 only)
Global data sent = 2760.25 MB (all parties)
Query: The dataset size is (5000* 1000 * 64 bits [inverted_index] = 40MB) + (5000*64 bits=40KB [file_vector]) ~ 40MB, 1. why the communication overhead is 920MB which roughly 23 times? [I majorly concerned about the time2 stats which does the actual computation]
2. Do I need to do something to improve the code?
3. If not, are these stats reasonable? A reasoning could be of great help.
- The chosen protocol does not have a particularly efficient matrix multiplication, so the computation involves 5 million multiplications. The computation cost comes down somewhere around 100 bytes per multiplication, which is not out of the ordinary.
- I don't see any big potential improvements in this protocol if both inputs have to be secret. A minor improvement is to reduce the size of the prime if the range of the data allows it.
- See 1.
- The chosen protocol does not have a particularly efficient matrix multiplication, so the computation involves 5 million multiplications. The computation cost comes down somewhere around 100 bytes per multiplication, which is not out of the ordinary.
- I don't see any big potential improvements in this protocol if both inputs have to be secret. A minor improvement is to reduce the size of the prime if the range of the data allows it.
Answer: I tried running as "Scripts/compile-run.py -v -F 32 -E mal-shamir d+ -- -N 3 -T 1 -P 37", it gives error as "Fatal error: Field too small for chosen security. Increase size with -lgp or decrease security with -S"- See 1.