ApolloAuto/apollo-platform

share memory performance test

nmhx opened this issue · 13 comments

nmhx commented

Hi, I do some test about share memory performance.
I write a publish demo and a subscriber demo, send data and record the run time to compare share memory performance and socket performance. but run time is almost the some.
I select different mode by edit $ROS_ETC_DIR/transport_mode.yaml

send size of each time Number of frames run time of share memory run time of socket
30M 1024 83s 84s
20M 1024 53.5s 53s
15M 1024 39.4s 44.4s
10M 1024 26.4s 26.4s
5M 1024 13s 13s

Could you share your source code of your test? We will double check the numbers.

nmhx commented

I launch roscore, then launch subscriber demo and publish demo .
message type

message  DemoTime {
    optional uint32 id = 2;
    optional bytes data = 1;
}

publish demo

//main.cc
#include "modules/demo/demo.h"
APOLLO_MAIN(::apollo::demo::Demo);
// demo.cc
#include <chrono>
#include <thread>
#include <ctime>
#include <sys/time.h>
#include "modules/demo/demo.h"
namespace apollo {
namespace demo {

using apollo::common::adapter::AdapterManager;
using apollo::demo::DemoTime;
using apollo::common::Status;

#define BUFF_SIZE 1024 * 1024 * 10
#define FRAMES_NUM 1024

void Demo::TestPublish() {
//    ros::Rate rate(10);
    uint32_t id = 0;
    char *buf = new char[BUFF_SIZE];
    memset(buf, 96, BUFF_SIZE);
    struct timeval start, end;
    gettimeofday(&start, NULL);
    DemoTime demo_time;
    while (ros::ok() && id < FRAMES_NUM) {
        demo_time.set_id(id);
        demo_time.set_data(std::string(buf));
        AdapterManager::PublishDemoTime(demo_time);
        ros::spinOnce();
    //    rate.sleep();
        id++;
    }
    delete [] buf;
    gettimeofday(&end, NULL);
    float time  = 1000*(end.tv_sec-start.tv_sec)+(end.tv_usec-start.tv_usec)/1000;
    std::cout << "publish end: time = " << time << std::endl;
}

std::string Demo::Name() const {return "demo"; }
Status Demo::Init() {
    std::cout << FLAGS_adapter_config_path << std::endl;
    AdapterManager::Init(FLAGS_adapter_config_path);
    return Status::OK();
}
Status Demo::Start() {
    std::thread test(&Demo::TestPublish, this);
    test.detach();
    return Status::OK();
}
void Demo::Stop() {
    timer_.stop();
}

}
}
// demo.h
#ifndef __DEMO_H_
#define __DEMO_H_
#include "modules/common/apollo_app.h"
#include "modules/common/macro.h"
#include "modules/demo/proto/demotime.pb.h"
#include "gflags/gflags.h"
#include "modules/common/adapters/adapter_gflags.h"
#include "modules/common/adapters/adapter_manager.h"
#include "ros/include/ros/ros.h"
#include "modules/common/monitor/monitor.h"
namespace apollo {
namespace demo {
class Demo :public apollo::common::ApolloApp {
public:
    Demo():monitor_(apollo::common::monitor::MonitorMessageItem::CONTROL){}
    std::string Name() const override;
    apollo::common::Status Init() override;
    apollo::common::Status Start() override;
    void Stop() override;
    virtual ~Demo() = default;
private:
    void PublishDemoTime();
    void TestPublish();
    void OnTimer(const ros::TimerEvent &event);
    ros::Timer timer_;
    apollo::common::monitor::Monitor monitor_;
};
}
}
#endif

subscriber demo

#include <iostream>
#include <thread>
#include <chrono>
#include <ctime>
#include <signal.h>
#include "modules/demo/proto/demotime.pb.h"
#include "gflags/gflags.h"
#include "modules/common/adapters/adapter_gflags.h"
#include "modules/common/adapters/adapter_manager.h"
#include "ros/include/ros/ros.h"

DEFINE_string(node_name, "DemoTime", "The demo module name in proto");
#define MAX_ID (1024 -1)

const std::string name = "testdemo";
using apollo::common::adapter::AdapterManager;
using apollo::demo::DemoTime;

struct timeval start, end;
void TestReceive(const apollo::demo::DemoTime &message) {
    static uint32_t id = 0;
    if ((message.id() -id) > 1 && id != 0) { 
        std::cout << "..............subscriber  lose " << id << "..........." << std::endl;  
    }
    id = message.id();
    if (MAX_ID == id) {
        gettimeofday(&end, NULL);
        float time  = 1000*(end.tv_sec-start.tv_sec)+(end.tv_usec-start.tv_usec)/1000;
        std::cout << name << "end: time = " << time << std::endl;
    }
    //std::cout << message.id() << std::endl;
}

int main(int argc, char **argv) {
    google::InitGoogleLogging(argv[0]);
    google::ParseCommandLineFlags(&argc, &argv, true);
    ros::init(argc, argv, name);

    AdapterManager::Init(FLAGS_adapter_config_path);
    gettimeofday(&start, NULL);
    AdapterManager::SetDemoTimeCallback(&TestReceive);
    ros::spin();
    return 0;
}

Shared memory based communication is to improve the efficiency of message transmission, which is the time-consuming for the message from publisher sending to subscriber receiving.

In the testcase you give, the publisher statistic is the time it takes to send 1024 frames messages, and the subscriber statistic is the time it takes to consume 1024 frames messages.

In order to compare the performance of shared memory based and socket based communications, it is recommended to use the testcase apollo-platform provides, which can maximize the elimination of other factors on the test results.

For your reference, the following is the source code for performance testing, based on apollo-platform official communication examples, please contact us if you have any questions.

apollo-platform official communication examples location:

apollo-platform/ros/ros_tutorials/roscpp_tutorials

msg type (new file)

// apollo-platform/ros/ros_tutorials/roscpp_tutorials/msg/perf.msg
uint32 id
string data
uint64 time

talker (modified)

// apollo-platform/ros/ros_tutorials/roscpp_tutorials/talker/talker.cpp
#include "ros/ros.h"
#include <sys/time.h>
#include "std_msgs/String.h"
#include "roscpp_tutorials/perf.h"
#include <sstream>

#define BUFF_SIZE 1024 * 1024 * 10
#define FRAMES_NUM 1024

int main(int argc, char **argv)
{

  ros::init(argc, argv, "talker");
  ros::NodeHandle n;
  ros::Publisher chatter_pub = n.advertise<roscpp_tutorials::perf>("chatter", 1000);
  // ros::Rate loop_rate(10);

  roscpp_tutorials::perf iperf;
  char *buf = new char[BUFF_SIZE];
  memset(buf, 96, BUFF_SIZE);
  struct timeval start;

  int count = 0;
  while (ros::ok() && count < FRAMES_NUM)
  {

    iperf.id = count;
    iperf.data = buf;

    gettimeofday(&start, NULL);
    iperf.time = start.tv_sec * 1000 + start.tv_usec / 1000;

    chatter_pub.publish(iperf);
    // ros::spinOnce();
    // loop_rate.sleep();
    ++count;
  }

  delete [] buf;
  std::cout << "publish end!" << std::endl;

  return 0;
}

listener (modified)

// apollo-platform/ros/ros_tutorials/roscpp_tutorials/listener/listener.cpp
#include "std_msgs/String.h"
#include "roscpp_tutorials/perf.h"

#define MAX_ID (1024 -1)

struct timeval end;
int64_t msg_count = 0;
uint64_t avg_time = 0;

void chatterCallback(const roscpp_tutorials::perf message)
{
  ++msg_count;
  gettimeofday(&end, NULL);
  if (avg_time == 0) {
    avg_time = (end.tv_sec * 1000 + end.tv_usec / 1000) - message.time;
  } else {
    avg_time = (((end.tv_sec * 1000 + end.tv_usec / 1000) - message.time) + avg_time * (msg_count - 1)) / msg_count;
  }

  static uint32_t id = 0;
  if ((message.id - id) > 1 && id != 0) {
    std::cout << "..............subscriber  lose " << id << "..........." << std::endl;
  }
  id = message.id;

  if (id > 1000) {
    std::cout << " transport avg time: " << avg_time << std::endl;
  }
}

int main(int argc, char **argv)
{
  ros::init(argc, argv, "listener");
  ros::NodeHandle n;
  ros::Subscriber sub = n.subscribe("chatter", 1000, chatterCallback);
  ros::spin();
  return 0;
}

After recompiling the apollo-platform (bash build.sh build), you can launch roscore, then launch subscriber demo and publish demo, thank you.

nmhx commented

I use the examples apollo-platform provides to test performance, share memory base run time is longer then socket base, but socket base is easy to lose a litter frames

nmhx commented

Could you tell me about the different share memory base performance and socket base performance?

Using the example above, my test results are as follows:

send size of each time Number of frames transport time of share memory transport time of socket
30M 1024 17ms 1418ms
20M 1024 11ms 262ms
15M 1024 8ms 99ms
10M 1024 6ms 17ms
5M 1024 3ms 6ms

By the way, my test environment is:
1、4.2.0-27-generic #32~14.04.1-Ubuntu
2、16 Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
3、MemTotal: 32GB

nmhx commented

my test environment is:

  1. ubuntu 14.04
  2. Intel® Core™ i5-6200U CPU @ 2.30GHz × 4
  3. MemTotal: 8GB
    I test the demo on docker
    Do you have a test on PC with low specifications?
nmhx commented

I build and run demo outside of docker, I get the transport time.

data share memory socket
5m 23ms 26ms
10m 47ms 90ms
15m 71ms 1461ms
20m 95ms 4570ms

but I do the test on docker and get the transport time. I edit $ROS_ETC_DIR/transort_mode.yaml

data share memory socket
5m 21ms 23ms
10m 45ms 46ms
15m 69ms 69ms
20m 92ms 92ms

Would you pack the complete test program and provide for us? my email address is "bjtulynn@163.com". If convenient, please tell us the detailed test process, thank you!

Close for now. Let me know if you are still having questions, you can reopen this issue anytime.

@bjtulynn can you show me the code of socket ? I wonder what you compare the shared-memory demo with ? thanks.