sahlberg/libnfs

nfs4:when network disconnect do nfs reinit then readdir lead to crash

xiaoyuezhufeng opened this issue · 5 comments

I found this problem when I was doing a network stability test.
For recovery link quickly, I do umount and destry for nfs handle and reinit as new link.Then, the crash come here.
message like :

nfs_stat64 failed: nfs_service failed. errno 11
nfs_stat64 failed: nfs_service failed. errno 11
ASAN:DEADLYSIGNAL

==7429==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fddbe32eaef bp 0x7ffc18fe6190 sp 0x7ffc18fe6150 T0)
==7429==The signal is caused by a READ memory access.
==7429==Hint: address points to the zero page.
#0 0x7fddbe32eaee in opendir_cb /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/libnfs-sync.c:1263
#1 0x7fddbe371b72 in nfs4_parse_readdir /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/nfs_v4.c:3704
#2 0x7fddbe370367 in nfs4_opendir_2_cb /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/nfs_v4.c:3571
#3 0x7fddbe37e558 in rpc_process_reply /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/pdu.c:368
#4 0x7fddbe3807f0 in rpc_process_pdu /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/pdu.c:643
#5 0x7fddbe381f25 in rpc_read_from_socket /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/socket.c:396
#6 0x7fddbe383302 in rpc_service /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/socket.c:554
#7 0x7fddbe321347 in nfs_service /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/libnfs.c:274
#8 0x7fddbe32ae0a in wait_for_nfs_reply /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/libnfs-sync.c:282
#9 0x7fddbe32ec90 in nfs_opendir /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/libnfs-sync.c:1286
#10 0x4016a2 in main /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/examples/nfs-fh.c:112
#11 0x7fddbd05ef29 in __libc_start_main ../csu/libc-start.c:308
#12 0x4011f9 in _start (/workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/examples/.libs/nfs-fh+0x4011f9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /workspace/nggap/trunk/blfs81/tmp_x86_64/libnfs-libnfs-5.0.2/lib/libnfs-sync.c:1263 in opendir_cb
==7429==ABORTING

While the program is running, logout of the NFS server ip for 20s then login again and again. There is a chance of a crash.

This test used nfs-fh.c and do same change like this:

int reconnect = 0;
struct nfs_stat_64 nfs_st;
struct nfsdir *fd = NULL;
struct nfsdirent *nfs_entry = NULL;
while (1) {
	nfs = nfs_init_context();
	if (nfs == NULL) {
		fprintf(stderr, "failed to init context\n");
		goto finished;
	}
	(void)nfs_set_version(nfs, 4);
	(void)nfs_set_autoreconnect(nfs, 1);
	(void)nfs_set_dircache(nfs, 0);

	url = nfs_parse_url_full(nfs, argv[1]);
	if (url == NULL) {
		fprintf(stderr, "%s\n", nfs_get_error(nfs));
		ret = 1;
		goto finished;
	}

	if (nfs_mount(nfs, url->server, url->path) != 0) {
		fprintf(stderr, "Failed to mount nfs share : %s\n",
				nfs_get_error(nfs));
		ret = 1;
		goto finished;
	}

	while (1) {
		ret = nfs_opendir(nfs, "/abc", &fd);
		if (ret < 0) {
			printf("nfs_opendir failed: %s. errno %d\n", nfs_get_error(nfs), errno);
			break;
		}

		while (1) {
			char fullname[512];
			nfs_entry = nfs_readdir(nfs, fd);
			if (!nfs_entry) {
				break;
			}

			if (strcmp(nfs_entry->name, ".") == 0 ||
				strcmp(nfs_entry->name, "..") == 0) {
				continue;
			}

			snprintf(fullname, sizeof(fullname), "/abc/%s", nfs_entry->name);
			ret = nfs_stat64(nfs, fullname, &nfs_st);
			if (ret < 0) {
				printf("nfs_stat64 failed: %s. errno %d\n", nfs_get_error(nfs), errno);
				break;
			}
			printf("fullname %s\n", fullname);
		}
		nfs_closedir(nfs, fd);
	}
	nfs_umount(nfs);
	nfs_destroy_context(nfs);
	nfs = NULL;
}

In my other programs, the crash maybe occur in nfs_destroy_context function;
libnfs-libnfs-5.0.2/lib/libnfs-sync.c:186

./configure --prefix=/usr --enable-examples CFLAGS="-g2 -fno-omit-frame-pointer -fsanitize=address -lasan"

Please try current master, I have added a fix that should avoid this crash.

Please try current master, I have added a fix that should avoid this crash.

It worked. The problem was solved, thanks!