Windows compatibility issue in data-prep-toolkit-transforms
Closed this issue · 1 comments
Boris-Chernetsov commented
Description
The library attempts to use the Unix-specific 'fcntl' module which is not available on Windows systems.
To Reproduce
Steps to reproduce:
- Install data-prep-toolkit-transforms on Windows system
- Try to import and use the Pdf2ParquetTransform:
from data_processing_ray.runtime.ray import RayTransformLauncher
from pdf2parquet_transform import (
pdf2parquet_contents_type_cli_param,
pdf2parquet_contents_types,
)
- Get ModuleNotFoundError
Error message: ModuleNotFoundError: No module named 'fcntl'
Expected behavior
The library should either:
- Use Windows-compatible alternatives (like msvcrt) for file locking on Windows systems
- Gracefully handle the absence of fcntl on Windows
- Clearly document Windows compatibility limitations
Desktop
- OS: Windows 11
- Python Version: 3.11.9
- data-prep-toolkit-transforms Version: 0.2.2.dev2
Additional context
The specific part causing the issue is in the Pdf2ParquetTransform class where it attempts to use MultiLock for file synchronization operations. This functionality relies on the Unix-specific fcntl module.
I'm currently implementing a workaround using WSL, but it would be beneficial to have native Windows support.
Boris-Chernetsov commented
wrong project, sorry