mikaku/Fiwix

lseek past end of file does not zero fill hole

Closed this issue · 15 comments

If you lseek past the end of a file and write then the hole that was skipped should be filled with zeros but is not.

Here is documentation of that expected behavior:
https://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html

You can set the file position past the current end of the file. 
This does not by itself make the file longer; lseek never changes the file. 
But subsequent output at that position will extend the file. 
Characters between the previous end of file and the new position are filled with zeros.

This test program illustrates the problem:

#include <unistd.h>
#include <fcntl.h>

int main() {
        char buf[] = "testdata";
        int myfile = open("mytest", O_WRONLY | O_CREAT);
        lseek (myfile, 16, SEEK_SET);
        write(myfile, buf, 8);
        close(myfile);
}
$ cc lseekhole.c -o lseekhole
$ ./lseekhole
$ od -tx1 -Ax mytest
000000 09 2e 66 69 6c 65 09 22 6c 73 65 65 6b 68 6f 6c
000000 74 65 73 74 64 61 74 61

Expected output (which linux/gcc produces):

000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000 74 65 73 74 64 61 74 61

Here is my somewhat ugly workaround:

diff --git a/kernel/syscalls/write.c b/kernel/syscalls/write.c
index 969f952..7468b3d 100644
--- a/kernel/syscalls/write.c
+++ b/kernel/syscalls/write.c
@@ -38,6 +38,18 @@ int sys_write(unsigned int ufd, const char *buf, int count)
                return -EINVAL;
        }
        i = fd_table[current->fd[ufd]].inode;
+       if (fd_table[current->fd[ufd]].offset > i->i_size) {
+               int extension = fd_table[current->fd[ufd]].offset - i->i_size;
+               fd_table[current->fd[ufd]].offset = i->i_size;
+               i->fsop->lseek(i, i->i_size);
+               int errno;
+               while (extension--) {
+                       errno = i->fsop->write(i, &fd_table[current->fd[ufd]], "", 1);
+                       if (errno != 1)
+                               return errno;
+               }
+       }
+
        if(i->fsop && i->fsop->write) {
                errno = i->fsop->write(i, &fd_table[current->fd[ufd]], buf, count);
 #ifdef __DEBUG__

lseekhole binary attached.
lseekhole.gz

I should note that further testing of my workaround revealed a regression in another program (building tcc produced an invalid executable) so it looks like I will not be able to use it.

Not reproducible here, even using the supplied binary.

[(root) ~]# cat lseekhole.c 
#include <unistd.h>
#include <fcntl.h>

int main() {
        char buf[] = "testdata";
        int myfile = open("mytest", O_WRONLY | O_CREAT);
        lseek (myfile, 16, SEEK_SET);
        write(myfile, buf, 8);
        close(myfile);
}

[(root) ~]# ./lseekhole 
[(root) ~]# od -tx1 -Ax mytest             
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000010 74 65 73 74 64 61 74 61
000018

Can you, please, paste here the complete QEMU command you are using?
Perhaps I'll able to reproduce this problem if I use the same setup as you.

qemu-system-i386 \
    -monitor stdio \
    -debugcon file:debugcon.log \
    -vnc 10.0.1.171:1,password \
    -drive file=images/FiwixOS-3.1-i386.raw,format=raw,cache=writeback,index=0 \
    -kernel fiwix-git-latest/fiwix \
    -append "root=/dev/hda2" \
    -m size=256

It is qemu 4.2.1.

Perhaps something in libc works differently in FiwixOS 3.2. I can see if that makes a difference (although that wouldn't help me).

It is qemu 4.2.1.

I don't see any special parameter that could led to this change in lseek() behavior.
I'm, using QEMU emulator version 6.2.0, with the following setup:

qemu-system-i386 \
        -drive file=$IMGS_DIR/${FLOPPY_IMG},format=raw,if=floppy,index=0 \
        -drive file=$IMGS_DIR/fiwix-ext2-1GB.img,format=raw,if=ide,cache=writeback,index=0 \
        -boot a \
        -net none \
        -m 256 \
        -enable-kvm \
        -machine pc \
        -cpu 486 \
        -chardev pty,id=pciserial \
        -device pci-serial,chardev=pciserial \
        -serial pty \
        -debugcon stdio

Perhaps something in libc works differently in FiwixOS 3.2. I can see if that makes a difference (although that wouldn't help me).

FiwixOS 3.1 came with Newlib 3.2 and FiwixOS 3.2 uses Newlib 4.2.
Sincerely, I don't know all the changes they made between these two releases.

I've tested with qemu 4.2.1 and 7.1.50. I've tested with qemu i386 and x86_64. I've tested with FiwixOS 3.1 and 3.2. I've built Fiwix with a new gcc and an old one. I've used kvm and no kvm. I've used graphics and serial. In nearly every case, the problem reproduces. One time (!) I got it to fill with zeros but I have been unable to get that to happen again with the same setup.

This is really frustrating.
OK, I'll continue trying with different setups until I can reproduce it.
Thanks for the effort you put on it.

Following your recommendation on IRC (#bootstrappable), I was able to reproduce the problem after applying the following modifications in the test program:

#include <unistd.h>
#include <fcntl.h>

int main() {
	int n;
	char buf[] = "testdata";
	char filename[20];

	for(n = 0; n < 1000; n++) {
		sprintf(filename, "mytest%d", n);
        	int myfile = open(filename, O_WRONLY | O_CREAT);
        	lseek (myfile, 16, SEEK_SET);
        	write(myfile, buf, 8);
        	close(myfile);
	}
}

I'll work on it.
Thanks.

Good news,

I've found what is happening and where is the problem. This is a serious bug because it affects the synchronization between the buffer cache (block contents) and the page cache (file contents).

Basically, all happens here:

Fiwix/fs/ext2/file.c

Lines 98 to 115 in f2d318d

while(total_written < count) {
boffset = fd_table->offset % blksize;
if((block = bmap(i, fd_table->offset, FOR_WRITING)) < 0) {
inode_unlock(i);
return block;
}
bytes = blksize - boffset;
bytes = MIN(bytes, (count - total_written));
if(!(buf = bread(i->dev, block, blksize))) {
inode_unlock(i);
return -EIO;
}
memcpy_b(buf->data + boffset, buffer + total_written, bytes);
update_page_cache(i, fd_table->offset, buffer + total_written, bytes);
bwrite(buf);
total_written += bytes;
fd_table->offset += bytes;
}

What is happening

When you want to put something into a file, the kernel obtains the block where it will write it:

if((block = bmap(i, fd_table->offset, FOR_WRITING)) < 0) {

If this block is new for the file (which is true in our case since the file was empty), the function ext2_bmap() correctly sets to zero the buffer that contains such block:

memset_b(buf->data, 0, blksize);

This means that, when buffer (that contains the string testdata) is copied into buf->data at offset 16, all bytes before this offset are all zeros:

memcpy_b(buf->data + boffset, buffer + total_written, bytes);

The problem

Once the contents of buffer are good, the kernel needs to synchronize this change in the page cache:

update_page_cache(i, fd_table->offset, buffer + total_written, bytes);

The problem here is that the kernel finds a page that matches with the same inode and offset, belonging to another file what was used recently. This explains to me why I was unable to reproduce it right before booting the system. The only way to reproduce this issue was recompiling the program, since compilation creates some temporary files that are removed latter. Then, when the test program creates the file myfile the kernel assigns it a reused inode.

I hope to provide a fix soon.

@rick-masters, please check this patch to see if it fixes this issue.

Yes, this fixes the issue. Thank you very much!

Perfect!, I'll fix the same bug in the Minix filesystem (v1 and v2).

Thank you.