fluencelabs/nox

Integration tests with p2p connectivity often hang

Closed this issue · 1 comments

We have integration tests in aqua that go through 3-4 nodes. And it often hangs. If it hangs once, it will hang some long time later. I often use a bad option to avoid this - changing environment in tests. This is repeated in different networks and on different nodes. Something from the logs could not be understood.
Aqua code:

func viaArr(node_id: string, viaAr: []string) -> Info:
    on node_id via viaAr:
        p <- Peer.identify()
    <- p

in viaAr were constantly used these nodes as arguments: relays[4].peerId, [relays[2].peerId, relays[1].peerId].
For krasnodar:

/dns4/kras-03.fluence.dev/tcp/19001/wss/p2p/12D3KooWJd3HaMJ1rpLY1kQvcjRPEvnDwcXrH8mJvk7ypcZXqXGE
/dns4/kras-01.fluence.dev/tcp/19001/wss/p2p/12D3KooWKnEqMfYo9zvfHmqTLpLdiHXPe4SVqUWcWHDJdFGrSmcA
/dns4/kras-00.fluence.dev/tcp/19001/wss/p2p/12D3KooWR4cv1a8tv7pps4HH6wePNaK6gf1Hww5wcCMzeWxyNw51

example in test
https://github.com/fluencelabs/aqua/runs/5642846674?check_suite_focus=true

It seems that you are experiencing issues with integration tests in the Aqua library hanging on certain nodes. You mention that this issue occurs on different networks and nodes, and that changing the environment in the tests can sometimes help to avoid the issue. You also provide the peer IDs of some of the nodes that are being used in the integration tests.

To troubleshoot this issue, you may want to start by reviewing the logs for the nodes involved in the integration tests to see if there are any clues about what is causing the tests to hang. It may also be helpful to review the code for the Aqua library and the integration tests to see if there are any issues or bugs that could be causing the tests to hang.

It is possible that the issue could be caused by problems with the network or with the nodes themselves. In this case, you may need to work with your network administrator or the operators of the nodes to identify and resolve the issue.

Without more information about the specific steps you have taken to troubleshoot the issue and the environment in which you are running the tests, it is difficult to provide more specific guidance on how to resolve the problem. However, I hope that the general suggestions I have provided will be helpful as you work to identify and fix the issue.

use std::sync::{Arc, Mutex};

fn main() {
    let mut service_manager = ServiceManager::new();

    let service_id = "service1".to_string();
    let service = Service {
        // Other fields for the service go here...
        paused: false,
    };
    service_manager.add_service(service_id.clone(), service);

    // Pause the service
    service_manager.pause_service(&service_id);

    // Check the state of the service
    let service = service_manager.services.get(&service_id).unwrap();
    let service = service.lock().unwrap();
    assert!(service.paused);

    // Unpause the service
    service_manager.unpause_service(&service_id);

    // Check the state of the service again
    let service = service_manager.services.get(&service_id).unwrap();
    let service = service.lock().unwrap();
    assert!(!service.paused);
}

This code creates a new instance of the ServiceManager struct and adds a new service to it. It then pauses the service and verifies that the service's paused flag is set to true. Finally, it unpauses the service and verifies that the flag is set to false.

Keep in mind that this is just a simple example, and you will likely need to modify the code to fit the specific needs of your project. For example, you may want to add additional methods to the ServiceManager struct to allow for other operations, such as getting the interface of a service or getting the state of a service. You may also need to add additional fields to the Service struct to store the data or state needed by your service