`is_valid_name()`: What conforms a valid name?

Question

`is_valid_name()`: What conforms a valid name?

alejandro-colomar opened this issue a year ago · 9 comments

There's is_valid_user_name():

alx@debian:~/src/shadow/shadow/wextra$ grepc -ntfd is_valid_user_name src/ lib*


lib/chkname.c:75:
bool is_valid_user_name (const char *name)
{
	size_t  maxlen;

	/*
	 * User names length are limited by the kernel
	 */
	maxlen = sysconf(_SC_LOGIN_NAME_MAX);
	if (strlen(name) > maxlen)
		return false;

	return is_valid_name (name);
}

Which limits the length of a valid user name by sysconf(_SC_LOGIN_NAME_MAX). But it calls is_valid_name():

alx@debian:~/src/shadow/shadow/wextra$ grepc -ntfd is_valid_name src/ lib*


lib/chkname.c:28:
static bool is_valid_name (const char *name)
{
	if (allow_bad_names) {
		return true;
	}

	/*
         * User/group names must match gnu e-regex:
         *    [a-zA-Z0-9_.][a-zA-Z0-9_.-]{0,30}[a-zA-Z0-9_.$-]?
         *
         * as a non-POSIX, extension, allow "$" as the last char for
         * sake of Samba 3.x "add machine script"
         *
         * Also do not allow fully numeric names or just "." or "..".
         */
	int numeric;

	if ('\0' == *name ||
	    ('.' == *name && (('.' == name[1] && '\0' == name[2]) ||
			      '\0' == name[1])) ||
	    !((*name >= 'a' && *name <= 'z') ||
	      (*name >= 'A' && *name <= 'Z') ||
	      (*name >= '0' && *name <= '9') ||
	      *name == '_' ||
	      *name == '.')) {
		return false;
	}

	numeric = isdigit(*name);

	while ('\0' != *++name) {
		if (!((*name >= 'a' && *name <= 'z') ||
		      (*name >= 'A' && *name <= 'Z') ||
		      (*name >= '0' && *name <= '9') ||
		      *name == '_' ||
		      *name == '.' ||
		      *name == '-' ||
		      (*name == '$' && name[1] == '\0')
		     )) {
			return false;
		}
		numeric &= isdigit(*name);
	}

	return !numeric;
}

Which in a comment, says the name should match a regex that allows no more than 32 characters. I don't see the code actually enforcing that regex, as the loop seems unlimited. So, I have some questions:

Should we be limiting valid user names to 32 characters?
Or should we change that regex to specify what the code is really doing?

Or am I missing something?

Cc: @ikerexxe This was triggered by your suggestion of writing tests for this function. :)

Answer 1 · 2023-11-12T11:40:21.000Z

See this program:

#include <stdio.h>
#include <unistd.h>

int
main(void)
{
	printf("_SC_LOGIN_NAME_MAX: %ld\n", sysconf(_SC_LOGIN_NAME_MAX));
}

alx@debian:~/tmp$ cc -Wall -Wextra login_name_max.c 
alx@debian:~/tmp$ ./a.out 
_SC_LOGIN_NAME_MAX: 256

No 32.

Answer 2 · 2023-11-13T08:25:32.000Z

AFAIK, POSIX doesn't specify a maximum length for the username, so I think it's better to update the comment to reflect what the code does.

Answer 3 · 2023-11-13T12:17:53.000Z

Okay. I suggest using this BRE instead: [a-zA-Z0-9_.][a-zA-Z0-9_.-]*\$\?

Answer 4 · 2023-11-13T12:31:56.000Z

Okay. I suggest using this BRE instead: [a-zA-Z0-9_.][a-zA-Z0-9_.-]*\$\?

I think you added an extra \ at the end.

It should be [a-zA-Z0-9_.][a-zA-Z0-9_.-]*\$?

Answer 5 · 2023-11-13T12:40:57.000Z

No, in a BRE, \? is special, but ? is a simple question mark. It would be in an ERE where it should be ?.

Although it's true for \$. I should have used $, which in the middle of a BRE is just a dollar.

See grep(1):

   Basic vs Extended Regular Expressions
     In  basic  regular  expressions  the meta‐characters ?, +, {, |, (, and )
     lose their special meaning; instead use the backslashed versions \?,  \+,
     \{, \|, \(, and \).

Answer 6 · 2023-11-13T12:48:48.000Z

Now I remember that utmp(5) field ut_user is limited by UT_NAMESIZE, which is 32. Does it mean something?

Answer 7 · 2023-11-13T16:43:14.000Z

Now I remember that utmp(5) field ut_user is limited by UT_NAMESIZE, which is 32. Does it mean something?

I don't think it means anything - utmp has other aging related issues and is set to be replaced anyway, right?

Answer 8 · 2023-11-13T17:36:01.000Z

Right; just wanted to make sure I'm not missing anything.

Answer 9 · 2023-11-14T07:41:14.000Z

I've been checking the code and the processing in utmp for the name will be done for the first 32 characters, anything that is beyond that isn't processed. This means that we could have a problem with users where the first 32 characters are the same, and the rest is different. I don't expect that to happen often, as users usually choose usernames shorter than 10 characters. So, yes, in theory we can have a problem, but in reality the probability of hitting this issue is low.