Permission issue with Cosmos cache in some restricted environments
Closed this issue · 5 comments
Cosmos caching mecanism can trigger permission errors when the user does not have permission to chmod files in the cache location.
This can happen, for example, when using an Azure Files volume with AKS/Azure container instances/Azure container apps. A known (unfortunate) limitation of this system is that you can't change a file permissions, permissions are defined at mount time: https://stackoverflow.com/questions/58301985/permissions-on-azure-file.
I get the following error when trying to use Cosmos cache with such a volume:
[2024-05-29, 16:57:16 CEST] {taskinstance.py:2905} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
return func(self, *args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/cosmos/operators/base.py", line 266, in execute
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
File "/home/airflow/.local/lib/python3.10/site-packages/cosmos/operators/local.py", line 499, in build_and_run_cmd
result = self.run_command(cmd=dbt_cmd, env=env, context=context)
File "/home/airflow/.local/lib/python3.10/site-packages/cosmos/operators/local.py", line 366, in run_command
cache._update_partial_parse_cache(partial_parse_file, self.cache_dir)
File "/home/airflow/.local/lib/python3.10/site-packages/cosmos/cache.py", line 110, in _update_partial_parse_cache
shutil.copy(str(latest_partial_parse_filepath), str(cache_path))
File "/usr/local/lib/python3.10/shutil.py", line 418, in copy
copymode(src, dst, follow_symlinks=follow_symlinks)
File "/usr/local/lib/python3.10/shutil.py", line 307, in copymode
chmod_func(dst, stat.S_IMODE(st.st_mode))
PermissionError: [Errno 1] Operation not permitted: '/cache/cosmos/dbt/target/partial_parse.msgpack'`
The problem is a consequence of using shutil.copy
which, according to the documentation "copies the file data and the file’s permission mode", and therefore fails in this context:
astronomer-cosmos/cosmos/cache.py
Lines 110 to 111 in bfca374
Using shutil.copyfile
should solve this issue.
To resolve the PermissionError
caused by shutil.copy
in the Cosmos caching mechanism when using an Azure Files volume with AKS, Azure container instances, or Azure container apps, you can modify the code to use shutil.copyfile
instead of shutil.copy
. This is because shutil.copy
attempts to copy file metadata, which can lead to permission issues on certain filesystems like Azure Files.
Here is the modified code:
def _update_partial_parse_cache(latest_partial_parse_filepath: Path, cache_dir: Path) -> None:
"""
Update the cache to have the latest partial parse file contents.
:param latest_partial_parse_filepath: Path to the most up-to-date partial parse file
:param cache_dir: Path to the Cosmos project cache directory
"""
cache_path = get_partial_parse_path(cache_dir)
manifest_path = get_partial_parse_path(cache_dir).parent / DBT_MANIFEST_FILE_NAME
latest_manifest_filepath = latest_partial_parse_filepath.parent / DBT_MANIFEST_FILE_NAME
shutil.copyfile(str(latest_partial_parse_filepath), str(cache_path))
shutil.copyfile(str(latest_manifest_filepath), str(manifest_path))
def _copy_partial_parse_to_project(partial_parse_filepath: Path, project_path: Path) -> None:
"""
Update target dbt project directory to have the latest partial parse file contents.
:param partial_parse_filepath: Path to the most up-to-date partial parse file
:param project_path: Path to the target dbt project directory
"""
target_partial_parse_file = get_partial_parse_path(project_path)
tmp_target_dir = project_path / DBT_TARGET_DIR_NAME
tmp_target_dir.mkdir(exist_ok=True)
source_manifest_filepath = partial_parse_filepath.parent / DBT_MANIFEST_FILE_NAME
target_manifest_filepath = target_partial_parse_file.parent / DBT_MANIFEST_FILE_NAME
shutil.copyfile(str(partial_parse_filepath), str(target_partial_parse_file))
patch_partial_parse_content(target_partial_parse_file, project_path)
if source_manifest_filepath.exists():
shutil.copyfile(str(source_manifest_filepath), str(target_manifest_filepath))
By using shutil.copyfile
, you avoid copying the file metadata, which should help in avoiding the PermissionError
on Azure Files volumes [1].
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
I believe we are using shutil.copy
because we want the file metadata like modification time etc. Did you try to use cache-dir config. You can set this location where you have permission.
Sorry for the late reply @pankajastro!
If that is the intention, then shutil.copy
is proably not achieving what you expect it to do. According to the documentation:
copy() copies the file data and the file’s permission mode (see os.chmod()). Other metadata, like the file’s creation and modification times, is not preserved. To preserve all file metadata from the original, use copy2() instead.
Looking at the source code of shutil.copy
, you can see that it just runs shutil.copyfile
and then shutil.copymode
and I don't see why the second one would be relevant. If you really need to preserve these metadata, it seems that shutil.copystat
would be more appropriate.
I am using the cache-dir
config, but this is not really helping because my intention is to save the cache on this volume so that it can be shared across containers.
If that is the intention, then shutil.copy is proably not achieving what you expect it to do. According to the documentation
While that might be true, I can observe that we are comparing time in this context
astronomer-cosmos/cosmos/cache.py
Line 87 in 8e69a36
shutil.copystat appears more promising for preserving this metadata, but I'm curious whether it resolves your permission issue, given that it also copies permission bits?
I ran some tests today, and you're correct that I run into a similar permission issue. Actually, even modifying the the file modification timestamp is not permitted.
However, I don't believe that is functionnally problematic: the file's mtime will be set to the time when it was cached and if the manifest is generated at a later time its timestamp will be newer than the cache file timestamp, which will trigger a cache refresh anyway. Assuming we use shutil.copyfile
instead of shutil.copy
, th eonly situation I can think of that would exhibit a functional difference, would be the following sequence of events:
- The dbt manifest is generated at time 0, with mtime 0
- Cosmos caches the file at time 2, and the cache file has therefore mtime 2
- The dbt manifest is regenerated, and the updated mtime is 1
- Next time Cosmos runs, it would therefore not refresh the cache because the cache's mtime > manifest mtime
I don't see how that could happen unless there is a clock mismatch between the process generating the manifest and Cosmos' caching process. That seems highly unlikely!
In addition, I also tested on a "normal filesystem" and can confirm that shutil.copy
does not work as you expect:
>>> from pathlib import Path
>>> import shutil
>>>
>>> source = Path('source.txt')
>>> source.write_text("Hello, World!")
>>> dest_dir = Path("/dest")
>>> source.stat()
os.stat_result(st_mode=33188, st_ino=5667023, st_dev=2096, st_nlink=1, st_uid=1000, st_gid=1000, st_size=13, st_atime=1718698252, st_mtime=1718698198, st_ctime=1718698198)
>>> # Test 1: shutil.copy
>>> dest1 = dest_dir / "test1.txt"
>>> shutil.copy(source, dest1)
>>> dest1.stat().st_mtime
os.stat_result(st_mode=33188, st_ino=5667073, st_dev=2096, st_nlink=1, st_uid=1000, st_gid=1000, st_size=13, st_atime=1718698252, st_mtime=1718698252, st_ctime=1718698252)
>>> # Test 2: shutil.copy2
>>> dest2 = dest_dir / "test2.txt"
>>> shutil.copy2(source, dest2)
>>> dest2.stat().st_mtime
os.stat_result(st_mode=33188, st_ino=5667040, st_dev=2096, st_nlink=1, st_uid=1000, st_gid=1000, st_size=13, st_atime=1718698252, st_mtime=1718698198, st_ctime=1718698278)
>>> # Test 3: shutil.copyfile
>>> dest3 = dest_dir / "test3.txt"
>>> shutil.copyfile(source, dest3)
>>> dest3.stat()
os.stat_result(st_mode=33188, st_ino=5667062, st_dev=2096, st_nlink=1, st_uid=1000, st_gid=1000, st_size=13, st_atime=1718699621, st_mtime=1718699625, st_ctime=1718699625)
The only effect of using copy
instead of copyfile
is that the file permissions will be copied, but that seems to be rather an undesirable behaviour 😉