Cuju-ft/Cuju

virtio_blk_get_request_from_head load request fail

PJTPJT opened this issue ยท 6 comments

When failover , If we call virtio_blk_get_request_from_head to load record_list request, it will load wrong request and cause the assert(type & VIRTIO_BLK_T_OUT) check fail.

Now we find some fix to solve this problem:

  1. Change the method of virtqueue_pop get request address (no use undo_map)
  2. Add get req->sector_num function on virtio_blk_get_request_from_head()
  3. Defer virtio_blk_free_request() from virtio_blk_rw_complete() to virtio_blk_complete_head()

And we still find virtio_blk_get_request_from_head sometimes load a wrong request (type !=1)
now maybe the problem is some dirty page produce from qemu and not be tracked (cpu_physical_memory_map)
the defer_write_map mechanism maybe is wrong

Some case caused crash
Master:
backup
Backup:
master
The incorrect address of req->elem.out_sg in Master node caused Backup node crash when failover. Backup node can deal with previous reqs which have correct address, and crash when deal with incorrect reqs.

If save_device has wrong type, defer virtio_blk_free_request() can solve this problem

I find a possible way to eliminate the gfn assert.

If we checked the dirty page tracked in each epoch by kvmft_assert_ram_hash_and_dlist, we would find that the gfn assert occurs.
The scenario is that the VM boot until entering login screen (donโ€™t do anything), and then open FT mode. Ensure the system is already in FT mode, then type login account and password. At this time, the gfn assert will occur.

To solve this issue, the virtio-blk.c I changed below:

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c019d2d..848eb1a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -218,7 +218,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
    }

    stb_p(&req->in->status, status);
-    virtqueue_push(req->vq, &req->elem, req->in_len);
+    virtqueue_push(req->vq, &req->elem, iov_size(req->elem.in_sg, req->elem.in_num));
    if (s->dataplane_started && !s->dataplane_disabled) {
        virtio_blk_data_plane_notify(s->dataplane, req->vq);
    } else {

Since the req->in_len is always 0 in virtio_blk_handle_request when entering FT mode in master node, it will cause the incorrect result from virtqueue_push.

However, clear the gfn assert away, but the sector_num and type problems are still here.

@coldfunction Great!
But I think we need add this function to confirm_req_read_memory_mapped

--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -45,6 +45,7 @@ static void confirm_req_read_memory_mapped(VirtIOBlockReq *req)
         req->in = (void *)in_iov[in_num - 1].iov_base
               + in_iov[in_num - 1].iov_len
               - sizeof(struct virtio_blk_inhdr);
+                req->in_len = iov_size(in_iov, in_num);
     } else {
         printf("%s %p already mapped\n", __func__, req);
         abort();

Because we set req->in_len = 0 on virtio_blk_handle_request function
So we need restore it on confirm_req_read_memory_mapped function

vring_desc_read

static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
                            hwaddr desc_pa, int i)
{       
    address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc), MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));

Master:
vdev = 0x7ffff50e2410, desc = 0x7fffffff7c30, desc_pa = 899092480, i = 52
caller : virtqueue_pop
&desc->addr = 8896512 &desc->len= 912 &desc->flags= 4 &desc->next= 20 
vdev = 0x7ffff50e2410, desc = 0x7fffffff7c30, desc_pa = 8896512, i = 0
caller : virtqueue_pop
&desc->addr = 899692216 &desc->len= 16 &desc->flags= 1 &desc->next= 1

Slave:
virtio_blk_load_device handle head 3/30: 52
vdev = 0x7ffff50e2410, desc = 0x7fffffff78e0, desc_pa = 899092480, i = 52
caller : virtqueue_get
&desc->addr = 8896512 &desc->len= 912 &desc->flags= 4 &desc->next= 20 
vdev = 0x7ffff50e2410, desc = 0x7fffffff78e0, desc_pa = 8896512, i = 0
caller : virtqueue_get
&desc->addr = 18446612132323116032 &desc->len= 16 &desc->flags= 1 &desc->next= 1

Same desc_pa i and desc is buffer, why the output &desc->addr is different?
The only difference is address_space_memory?

sklin commented

Because guest OS free and re-allocate the descriptor table entry of the requests that have been done, we cannot let slave re-execute those already-done requests.
Synchronize requests in record list and descriptor table so that slave will not re-execute requests that have been done.

@@ -151,8 +154,16 @@ static void virtio_blk_complete_head(VirtIOBlockReq *req)
 {
     VirtIOBlock *s = global_virtio_block;
     ReqRecord *rec = req->record;
+    int i;
 
     if (rec != NULL) {
+        for (i=0 ; i<rec->len ; ++i) {
+            if (rec->reqs[i] == req) {
+                rec->completed[i] = true;
+                break;
+            }
+        }
+        assert(i<rec->len);
         if (--rec->left == 0) {
             QTAILQ_REMOVE(&s->record_list, rec, node);
             g_free(rec);
@@ -1110,12 +1122,17 @@ static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)

     // send temp_list and record_list to slave.
     QTAILQ_FOREACH(rec, &s->record_list, node) {
+        int nsend = 0; // debugging
         qemu_put_sbyte(f, 3);
-        qemu_put_be32(f, rec->len);
+        qemu_put_be32(f, rec->left);
         for (i = 0; i < rec->len; i++) {
+            if (rec->completed[i])
+                continue;
             qemu_put_be32(f, rec->list[i]);
             qemu_put_be32(f, rec->idx[i]);
+            nsend++;
         }
+        assert(nsend==rec->left);
     }
     if (s->temp_list && s->temp_list->len > 0) {
         qemu_put_sbyte(f, 3);