khenidak/dysk

It will hung when two VMs are writing to same vhd

Closed this issue · 10 comments

I tried to let two VMs mount same vhd, it works well, While when I try to write data to same vhd from two VMs, the file system of vhd will hung, is that expected? @khenidak

docker run --rm \
	-it \
	--privileged \
	-v /usr/src:/usr/src \
	-v /lib/modules:/lib/modules \
	khenidak/dysk-installer:0.5
	
docker run --rm \
	-it --privileged \
	-v /etc/ssl/certs:/etc/ssl/certs:ro \
	khenidak/dysk-cli:0.4 list
	
docker run --rm \
	-it --privileged \
	-v /etc/ssl/certs:/etc/ssl/certs:ro \
	khenidak/dysk-cli:0.4 mount -a xiazhang3 -k "..." --pageblob-name dysk5fRZHHX1.vhd --container-name dysks --auto-lease --break-lease

On VM#1:
sudo mkfs.ext4 /dev/dyskVr0WnfaF
sudo mkdir /mnt/dysk
sudo mount /dev/dyskVr0WnfaF /mnt/dysk

On VM#2:
sudo mkdir /mnt/dysk
sudo mount /dev/dyskrc6DBb6G /mnt/dysk

After writing to VM#2, write and list operation of /mnt/dysk will hung:

azureuser@k8s-master-39280284-0:/mnt/dysk$ ls -lt
total 28
-rw-rw-r-- 1 azureuser azureuser     5 Mar 12 03:06 20180312
-rw-rw-r-- 1 azureuser azureuser     2 Mar 12 03:05 b
-rw-r--r-- 1 root      root          4 Mar 12 03:05 andy
-rw-r--r-- 1 root      root          0 Mar 12 03:05 a
drwx------ 2 root      root      16384 Mar 12 03:01 lost+found
azureuser@k8s-master-39280284-0:/mnt/dysk$ sudo echo x > x
hung here!!!

yes -- You can only mount the same dysk on two different VMs as 'readonly' but not readwrite.

shall we document this behavior in dysk somewhere? And one write and multiple read is allowed, right?

Yes, One Write+Multiple Reads. I will make sure that modes are documented somewhere. thanks

Is it possible to return error instead of hung? I hit this issue in dysk flexvolume debugging, I know it was due to my multiple write, while this behavior is really not friendly.

and finally I could not log on one agent VM, that's a critical bug.

The correct behaviour is return error, like following, that's the ideal way in my opinion:

root@nginx-flex-dysk2:/# ls -lt /data
ls: reading directory '/data': Input/output error
total 0

A simple repro is try mount same vhd on one VM with readwrite permission, and then write to this vhd in both container, my VM hung there, I could not ssh to that VM any more.

The correct behavior is the old rw attach will lose the lease and graceful detach the disk. Any inflight i/o will fail with -EIO. This has been tested before. However I think your issue is related to what i have discussed in #33 - I will retest the scenario again.

Let us track both in #33