It depends on your communication bandwidth between GPUs. If it is not large enough, the communication cannot be hidden in the computation, which will slow down the inference. Typically, an NVLink is essential. However, it is quite accessible nowadays. You can find it here.