Systemd does not run when using full path

OS: Fedora 36
systemd version: v250.3-8.fc36

Might be a missing selinux configuration, but this is the error that is produced

systemd[17323]: step-ca.service: Failed to locate executable /usr/bin/step-ca: Permission denied
systemd[17323]: step-ca.service: Failed at step EXEC spawning /usr/bin/step-ca: Permission denied
systemd[1]: step-ca.service: Main process exited, code=exited, status=203/EXEC
systemd[1]: step-ca.service: Failed with result 'exit-code'.

Solution: switch to using relative path. Alternative we can figure out deeper why this issue occurs. There are no file permission errors that I know of.

Hm, could you check that the installed binary has the correct permissions (755)? If it does, then yea it's probably selinux-related. Going to grab a Fedora VM and run some tests later. Surprised this doesn't happen in Rockylinix too, i assumed that the SELinux configs between those two would be identical.

Could you describe your workaround in a little more detail please? I thought systemd services always required the absolute path to a binary to run.

Also, are you planning to deploy the ca on a Fedora host? There is no support for Fedora in this collection right now, but I could try to add it where possible if that is something you're interested in.

Yes, I checked both permission and ownership, and using just step-ca works. I will check if there is any artifact from other manual installations I made.

Workaround is just that, not using absolute path, and it will use the normal search path. Indeed it was something introduced later on, but all distros should be able to use it.

Yes, I prefer to use Fedora on everything moving forward, because I don't consider the moving target argument to be valid in the CI era. But since step-ca is simply golang, I don't see it having any issues there anyway.

Okay, I've added tests for Fedora 36 with #183, interestingly enough they do not experience the issue you are describing. I guess that's because SELinux is not enabled in the Fedora container perhaps?

I also added a task to set the correct labels, so assuming it really is SELinux-related, that should fix it. Could you try running the role from the PR branch and see if that fixes the issue for you? If not, then I'd need to see the binaries selinux labels to see what's going on (ls -lZ /bin/step-cli)

This is the first time using ls -Z so I don't really know how to read it. I did mess around with the chown, but here is the result:

-rwxr-xr-x. 1 root step-ca unconfined_u:object_r:user_tmp_t:s0 12659559 Jul  7 00:26 /usr/bin/step-awskms-init
-rwxr-xr-x. 1 root step-ca unconfined_u:object_r:user_tmp_t:s0 33923522 Jul  7 00:24 /usr/bin/step-ca
-rwxr-xr-x. 1 root step-ca unconfined_u:object_r:user_tmp_t:s0 30951825 Jul  6 22:31 /usr/bin/step-cli
-rwxr-xr-x. 1 root step-ca unconfined_u:object_r:user_tmp_t:s0 14438616 Jul  7 00:26 /usr/bin/step-cloudkms-ini

About the "working" workaround. It seems I had a previous installation in /usr/local/bin/step-ca which was being picked. That one worked well, but now I realize the permisions for those are wacky (I might have set the wrong flags in my previous test ansible).

--wxrw--wt. 1 root root system_u:object_r:bin_t:s0 30951825 Jul  6 22:31 /usr/local/bin/step

Anyway, removing the /usr/local/bin ones I get the same error as before. But I ran your restorecon -v {{ _step_cli_install_path }} manually and it saved it. Do you have a note on what those commands mean?

Also I believe using ansible native copy automatically takes care of that, so let's do a clean-up at some point migrating all to native commands?

I don't work on any SELinux-enabled systems normally, so take what I say with a grain of salt, but from what I understand, the most important bit is the type/label at the end. For binaries (like the ones in /usr/bin), that type is supposed to be bin_t. As long as it is, SELinux will let other programs like Systemd execute these files.

The default type for each directory (for example, user_home_t for anything in /home, user_tmp_t for anything in /tmp) is set by policies and these policies are applied when new files are created in that location.

However, by creating step-cli in /tmp, our label/type is set to user_tmp_t and that label does not change when we mv the file into /usr/bin, the same way file permissions don't change when you move a file from one directory to another. Then, when systemd tries to execute them it is blocked be SELinux (I'm pretty sure the audit log will contain a relevant message), leading to your issue.

Note that the executable in /usr/local/bin has the correct bin_t type, so it worked right away.

-rwxr-xr-x. 1 root step-ca unconfined_u:object_r:**user_tmp_t**:s0 30951825 Jul  6 22:31 /usr/bin/step-cli <-- Label that was applied when we created the file in /tmp

--wxrw--wt. 1 root root system_u:object_r:**bin_t**:s0 30951825 Jul  6 22:31 /usr/local/bin/step

As for fixing this: I just found out that there is a -Z flag for the mv command:

 -Z, --context                set SELinux security context of destination
                                 file to default type

restorecon just tells the system to reapply the labels as defined by the policies to everything in a given path, but that shouldn't be necessary now that we just tell mv to apply the correct label in the first place.

ansible.builtin.copy does seem to automatically apply the correct labels from the policy too, but you can't copy over a running binary in Linux, only delete/move.

Thanks, that quick overview is helpful. Curious why it didn't get caught on rocky though.

but you can't copy over a running binary in Linux, only delete/move.

Why are we allowing this though? Btw what is the more common ansible architecture to upgrade a role's version, rerunning the role or implementing a different module/tag/playbook?

I'm not quite I understand your question, What do you mean by "upgrading a roles version"? The version of this collection? Well, just use ansible-galaxy to install a newer version of this collection - I'll publich a new release with this fix soon-ish.

I mean the version of the software installed by the role. Like from step-ca 0.20 to step-ca 0.21

Ah, just set step_ca_version to the version that you want to have installed and the role will take care of it. If the version is already installed, nothing will happen. That's how most roles do it, similar to how modules always try to reach a certain "state".

Ok, then I think some things need to be improved:

Fail-check on intermediate_password changes: If it changes, but step_ca_config.stat.exists then it would not re-key the intermediate. Not sure why it is intentionally left outside the Initialize CA block.

ansible-collection-smallstep/roles/step_ca/tasks/init.yml

Lines 7 to 15 in 122af17

    
           # Always create the intermediate password file as it is needed for CA operation 
        
           - name: Intermediate password file is present 
        
             copy: 
        
               content: "{{ step_ca_intermediate_password }}" 
        
               dest: "{{ step_ca_intermediate_password_file }}" 
        
               owner: "{{ step_ca_user }}" 
        
               mode: 0600 
        
               group: "{{ step_ca_user }}" 
        
             no_log: yes

Add a separate module for renewing the step-ca root/intermediate keys/certs
Include role dependencies for step-ca, step_bootstrap_host (maybe rename to step_host/step_client to indicate it's not a one-use role), step_acme_cert on step_cli

	# Always create the intermediate password file as it is needed for CA operation
	- name: Intermediate password file is present
	copy:
	content: "{{ step_ca_intermediate_password }}"
	dest: "{{ step_ca_intermediate_password_file }}"
	owner: "{{ step_ca_user }}"
	mode: 0600
	group: "{{ step_ca_user }}"
	no_log: yes