Locale path in AWS S3

Question

Locale path in AWS S3

ygg-sajith opened this issue 4 years ago · 15 comments

Which version of Django are you using?: django 1.5
Which version of django-rosetta are you using?: 0.7.2
Have you looked trough recent issues and checked this isn't a duplicate?Not duplicate

Currently rosetta scans the locale paths specify it in settings.py and loads into admin UI. Once the translation added and click to save button will store it in the same path of the corresponding locale. Is there any way we can change this to store it remote server or AWS S3 location.

In docker container based environment, during auto-scaling the translator instance can be redeployed so the saved changes in .po files cane be lost. To overcome this, prefer to store the files in AWS S3 location.

Is this approach is possible to achieve?

Answer 1 · 2020-12-11T14:54:22.000Z

same requirement for my team 🙏

Answer 2 · 2021-04-10T13:58:30.000Z

I recognize this could be an interesting feature but don't have the bandwidth to implement it, at the moment. If anyone feels like taking on this development, please get in touch here to discuss first, thank you!

Answer 3 · 2021-04-10T14:17:04.000Z

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

Answer 4 · 2021-04-11T10:45:00.000Z

I have exactly the same issue, migrating a project to AWS. Maybe it would be good to have a 'save hooks' in Rosetta that could be overwritten at application level, so a system could say "post a copy to this URL" or "run a management command after saving". With some such hook. one could either SCP changed files off the instance, or have an external server fetch them. I'll try to read the code and see where this might be done.

Yup, we have the same issue as our project is on AWS.
Our idea was to have a way for overriding the save event and using boto3 to store for example on S3

Answer 5 · 2021-04-11T12:19:04.000Z

Rosetta sends a post_save signal, here (definition here), right after a block of data was saved to disk.

You could potentially write a signal handler that uploads to S3 right after saving, but that really only covers half of what this issue needs. More on this in the next comment.

Answer 6 · 2021-04-11T12:54:46.000Z

Right, here are the challenges I see, to implement this feature:

Under the hood, Rosetta uses the polib library to read and write PO and MO files. The "default" way to do that is to pass a file path to pofile and mofile. Emphasis on path, i.e. polib doesn't deal with file-like objects directly, but rather with (local) paths. That said, polib's API also specifies that we could pass the actual string content of the file instead of the path, we could potentially use that. Thoughts, @izimobil?
All over the code, Rosetta currently also relies on filesystem paths to find, read and write PO files. All these need to abstracted away to use a generic way to access the data.

So, assuming the first point can be easily handled by passing content to polib instead of file paths, the plan for the second point would probably be to:

Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.
Update the codebase locations that currently handle in filesystem paths directly, to uses the new API instead. I.e. the views should use generic ways to find, read and write PO objects, regardless of the underlying storage implementation.
Once everything works as it does now with the new API, we can write new storage implementations that use S3, or FTP, blockchain, whatever. It should be as simple as specifying an alternative storage class in Rosetta's settings. Note that this could be trickier than expected, when we have to deal with e.g. perceived IO performance, concurrent writes, underlying storage limitations, ...

Thoughts?

Answer 7 · 2021-04-11T12:58:55.000Z

PS: also worth mentioning: maybe it'd be much easier to mount an S3 bucket as a local filesystem with e.g. s3-fuse, then have Rosetta think it's dealing with local paths as it does right now, even though the PO files are on S3? 🤷‍♂️

Answer 8 · 2021-04-12T15:50:54.000Z

Design a storage interface (an API) that enumerates, reads and writes PO files: the default implementation would use the local filesystem, exactly as it is done today.

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Would it be feasible to have a setting like ROSETTA_STORAGE_BACKEND which defaults to the local filesystem and then switch from using file paths as the argument to pofile to strings of the file contents opened by the Django storage backend?

One potential issue is how "noisy" the filesystem access is. If there is a lot of read/writing going on in each request, the performance may not be acceptable. If it's just a couple files, it shouldn't be a major concern.

Answer 9 · 2021-04-12T16:23:30.000Z

Django has this system built-in already with backends available for S3 and many others. https://docs.djangoproject.com/en/3.2/ref/files/storage/

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

But this only covers part of the problem (step 3 above), i.e. I don't think we can just rely on Django storage (the feature, not the app) to directly deal with PO and MO files, because a) it's primarily meant to handle static and media files and b) we'd end up with lots of if-then-else blocks all over the view functions, depending on the capabilities of each storage back-end. This is precisely why I think we need a "RosettaFilesStorage" abstraction layer (step 1 above) that deals with enumerating, reading and writing of PO and MO files.

Answer 10 · 2021-04-12T16:30:10.000Z

Not quite, by default Django only handles in local files, IIRC, but django-storages would be the perfect solution here, probably.

Correct, when I said backends available, I was referring to third-party backends like django-storages.

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

Answer 11 · 2021-04-12T16:41:29.000Z

What sort of functionality does Rosetta require beyond the standard read/write/list files where an additional abstraction would be necessary? In my experience, you can count on django-storages providing all the necessary primitives for basic file manipulation.

Not much, really. The *real" problem though, is that Rosetta currently uses "low-level" direct file access (i.e. open(path), file.read(), ...) all over the place (which will have to be converted to whatever storage does) and that the PO file discovery is heavily based on the assumption that Rosetta operates within the project it is installed in, i.e. it looks for PO files inside its project, and not in some remote storage totally decoupled from the project it "lives" in.

That, and obviously that we need to be able to pass content and not paths to polib.

Answer 12 · 2021-04-12T16:59:07.000Z

That makes sense. I've done some conversions from the Python file API to the Django file API in the past and the open, read, etc. are pretty easy/trivial to handle.

Discovery, however, looks problematic. Even if you could come up with a reasonable remote directory structure scanning remote storage like that tends to get really expensive (in terms of time elapsed). Does this happen one-time at startup or also during runtime? If it's a one-time thing, I wonder if you could handle it the same way as collectstatic where the local file system is scanned and then uploaded to the remote storage? The fact that these files can change during runtime certainly complicates things as well.

Answer 13 · 2021-04-12T17:18:32.000Z

That, and obviously that we need to be able to pass content and not paths to polib.

Hi @mbi , I can confirm that you can pass the content of the pofile as a string to polib !

Answer 14 · 2021-12-27T10:27:37.000Z

The problem no one has discussed is getting the PO / MO files back in the repo for version control.

Ideally, where ever the files reside should be within a repo that could push the changes back to a master repo.

Answer 15 · 2023-11-05T15:07:26.000Z

I'm having this problem as well, has there been any development?