sfu-db/dataprep

Windows compatibility issue in data-prep-toolkit-transforms

Closed this issue · 1 comments

Description
The library attempts to use the Unix-specific 'fcntl' module which is not available on Windows systems.

To Reproduce
Steps to reproduce:

  1. Install data-prep-toolkit-transforms on Windows system
  2. Try to import and use the Pdf2ParquetTransform:
from data_processing_ray.runtime.ray import RayTransformLauncher
from pdf2parquet_transform import (
   pdf2parquet_contents_type_cli_param, 
   pdf2parquet_contents_types,
)
  1. Get ModuleNotFoundError Error message: ModuleNotFoundError: No module named 'fcntl'

Expected behavior
The library should either:

  • Use Windows-compatible alternatives (like msvcrt) for file locking on Windows systems
  • Gracefully handle the absence of fcntl on Windows
  • Clearly document Windows compatibility limitations

Desktop

  • OS: Windows 11
  • Python Version: 3.11.9
  • data-prep-toolkit-transforms Version: 0.2.2.dev2

Additional context
The specific part causing the issue is in the Pdf2ParquetTransform class where it attempts to use MultiLock for file synchronization operations. This functionality relies on the Unix-specific fcntl module.
I'm currently implementing a workaround using WSL, but it would be beneficial to have native Windows support.

wrong project, sorry