FactoryService Cache Fail
Closed this issue · 6 comments
./app/services/factory_service.rb works in theory but fails in practice.
- Revisit the purpose of the FactoryService
- Redesign the FactoryService cache
- Implement the redesign
- Test the redesign
- Repeat Until Success
- consider caching EPUBs and WebGL at the end of the import process. In many ways unpacking them for display in the app is analogous to derivative creation. Also, triggering this from a user action is dodgy and causes issues, especially when one unpacked item is reliant on another.
Added point about unpacking at ingest time not on user action
Confirming that after importing 13 EPUBs for HOB less than half would get unpacked and loaded on first browser load. For the rest it took one or two page reloads to get the EPUB to display. This will be more of an issue when we import 1000s of EPUBs and it won't be one of us triggering the caching for the first time. The EPUB will just look broken to somebody. Though I guess if they're being QC'ed on production then that might get em cached before they go live.
I think we'll probably have to unpack them at "featured representative setting time" rather than ingest. Well for new things. For replacement, we'll need to do it at ingest since it's already an "epub" or "webgl" featured rep.
A combination of slowish internet and multiple tabs can get you some interesting behavior with not-yet-unpacked EPUBs. I think this is what I was seeing when I set out to specifically unpack these (comment RE: 13 HOB titles above).
A tab not kept in focus will trigger the unzipping but not start requesting the files yet. Open another tab to the same EPUB and there's a good chance another worker will start unpacking the EPUB again. You can see the original tab achieve infinite loading in this way.
The worker that was serving it retains those weird locked versions of the files that got deleted with the second unpacking, which probably explains why we see a lot of these .nfs
files in these directories, which in turn (probably) cause the inability to delete the directory when a new FileSet version comes along. Note the identical sizes.
EPUB_sub_dir$ sudo -u heliotrope-app-user lsof +D .
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bundle 8783 heliotrope-app-user 23r REG 0,32 577 15155001675 ./.nfs00000003874ef94b0000487e
bundle 8783 heliotrope-app-user 24r REG 0,32 627 15155001676 ./.nfs00000003874ef94c00004876
bundle 8783 heliotrope-app-user 25r REG 0,32 1167 15162582373 ./.nfs0000000387c2a5650000487b
bundle 8783 heliotrope-app-user 26r REG 0,32 625 15160899317 ./.nfs0000000387a8f6f50000486c
bundle 8783 heliotrope-app-user 27r REG 0,32 3366 15160899307 ./.nfs0000000387a8f6eb0000486e
bundle 8783 heliotrope-app-user 28r REG 0,32 618 15160899308 ./.nfs0000000387a8f6ec00004879
bundle 8783 heliotrope-app-user 29r REG 0,32 2355 15160899316 ./.nfs0000000387a8f6f400004878
bundle 8783 heliotrope-app-user 30r REG 0,32 2781 15176705414 ./.nfs00000003889a25860000486d
bundle 8783 heliotrope-app-user 31r REG 0,32 3436 15160899309 ./.nfs0000000387a8f6ed0000487c
bundle 8783 heliotrope-app-user 32r REG 0,32 9850 15155001683 ./.nfs00000003874ef9530000487a
bundle 8783 heliotrope-app-user 33r REG 0,32 15122 15162582372 ./.nfs0000000387c2a56400004872
bundle 8783 heliotrope-app-user 34r REG 0,32 84479 15160899306 ./.nfs0000000387a8f6ea00004875
bundle 8783 heliotrope-app-user 35r REG 0,32 37686 15176643723 ./.nfs000000038899348b0000487d
bundle 8783 heliotrope-app-user 36r REG 0,32 50926 15155001674 ./.nfs00000003874ef94a0000486b
bundle 8783 heliotrope-app-user 37r REG 0,32 9176 15176643722 ./.nfs000000038899348a00004871
bundle 8783 heliotrope-app-user 38r REG 0,32 30116 15160899305 ./.nfs0000000387a8f6e900004873
bundle 8783 heliotrope-app-user 39r REG 0,32 47912 15176643726 ./.nfs000000038899348e00004870
bundle 8783 heliotrope-app-user 40r REG 0,32 17002 15160899310 ./.nfs0000000387a8f6ee00004877
bundle 8783 heliotrope-app-user 41r REG 0,32 25066 15155001673 ./.nfs00000003874ef9490000486f
bundle 8783 heliotrope-app-user 42r REG 0,32 5814 15176705415 ./.nfs00000003889a258700004874
bundle 8803 heliotrope-app-user 23r REG 0,32 577 15155001885 ./cvi.xhtml
bundle 8803 heliotrope-app-user 24r REG 0,32 627 15171287685 ./htp.xhtml
bundle 8803 heliotrope-app-user 25r REG 0,32 1167 15155001887 ./fm1.xhtml
bundle 8803 heliotrope-app-user 26r REG 0,32 625 15160899463 ./tp.xhtml
bundle 8803 heliotrope-app-user 27r REG 0,32 3366 15155001884 ./cop.xhtml
bundle 8803 heliotrope-app-user 28r REG 0,32 618 15155001886 ./ded.xhtml
bundle 8803 heliotrope-app-user 29r REG 0,32 2355 15171667851 ./toc.xhtml
bundle 8803 heliotrope-app-user 30r REG 0,32 2781 15176705488 ./ack.xhtml
bundle 8803 heliotrope-app-user 31r REG 0,32 3436 15155001888 ./fm2.xhtml
bundle 8803 heliotrope-app-user 32r REG 0,32 9850 15155001895 ./itr.xhtml
bundle 8803 heliotrope-app-user 33r REG 0,32 15122 15162582443 ./c01.xhtml
bundle 8803 heliotrope-app-user 34r REG 0,32 84479 15155001881 ./c02.xhtml
bundle 8803 heliotrope-app-user 35r REG 0,32 37686 15155001882 ./c03.xhtml
bundle 8803 heliotrope-app-user 36r REG 0,32 50926 15155001883 ./c04.xhtml
bundle 8803 heliotrope-app-user 37r REG 0,32 9176 15176643798 ./appA.xhtml
bundle 8803 heliotrope-app-user 38r REG 0,32 30116 15160899456 ./appB.xhtml
bundle 8803 heliotrope-app-user 39r REG 0,32 47912 15171667850 ./nts.xhtml
bundle 8803 heliotrope-app-user 40r REG 0,32 17002 15171667845 ./gls.xhtml
bundle 8803 heliotrope-app-user 41r REG 0,32 25066 15155001879 ./bib.xhtml
bundle 8803 heliotrope-app-user 42r REG 0,32 5814 15155001880 ./bm1.xhtml
lsof 9970 heliotrope-app-user cwd DIR 0,32 1512 15171287622 .
lsof 9971 heliotrope-app-user cwd DIR 0,32 1512 15171287622 .
With reloads of the same EPUB, you can see the directory sizes go both up and down when monitoring the /tmp/epub
directory with du -sh *
. Another confirmation of what we were talking about earlier.
Anyway I don't think we need to research it exhaustively right now, it should all be moot once we make changes such as:
- single unpacking per EPUB FileSet version
- not unpacked by end-user action
- (maybe) never unpacked while set to featured representative
I think those changes will solve a lot of the unpacking and reversioning issues.
What if one or more users are loading an EPUB as it gets "reversioned"?
Could we use temporary private visibility to achieve more security as the swap is made?
Another thing that crossed my mind was that search engine bots will mess up the idea of keeping unread (open access) EPUBs unpacked unless we take some action against that.
Probably another reason not to worry about unpacking everything in advance?!
I think then, that we've decided on this (right?):
-
"Cached" content (epub, webgl) will be considered derivatives ala Hyrax. "Caching" a file had meant just pulling it out of Fedora and putting it on the file system.
We'll keep doing that, but now we'll put them where hyrax puts derivatives./tmp/derivatives/<pair-tree>
. We're going to call this "unpacking" or an "unpacked" epub/webgl/file and not "caching/cached" from now on. -
When a FileSet's FeaturedRepresentative is set (epub or webgl), the unpacked file will be created. We do this now and not during ingest because the ingestion process doesn't know the file is a epub/webgl. A user needs to do that (currently).
-
When a FileSet with a FeaturedRepresentative of "epub" or "webgl" is updated ("reversioned" with a new epub/webgl) the new file is unpacked (the old removed first).
But there's a lot of interdependant code right now and it's a little confusing. So to accomplish this, we're going to try to develop in smallish PRs with new code exisiting in parallel to the old code. Once the new code is in place, we'll "convert" all the epubs/webgls to the new "unpacked" model using a rake task. If the new "unpacking" code path works for everyone, we'll start removing the old code and eventually the old "cached" content.
Part 1
- Create an UnpackJob that unpacks an epub into it's derivatives space. It should also create the sqllite database for search.
- The UnpackJob should also unpack webgls into it's derivatives space.
- The UnpackJob should create the epub-webgl mapping when it needs to.
Part 2
- The UnpackJob should be called when a FileSet's FeaturedRepresentatives is "set" (epub, webgl only)
- The UnpackJob should be called when a FileSet (that's already a FeaturedRepersentative of epub, webgl) is "reversioned"
Part 3
- Backend processes (such as search) should have "unpack" code paths
- Front-end views should use the new "unpack" code path
- Create and run a rake task that will unpack all exisiting epub/webgls
Part 4
- Does this all work and make sense? Are there problems like race conditions that we need worry about (and use something like redlock to solve maybe?)
- Replace and remove all old code.
- Remove old "cached" data.
- As webgls are now going through
./tmp/derivatives
see if we can remove the old X-Sendfile Path from Apache for./tmp/webgl
. - Since epubs are now going through
./tmp/derivatives
and there's already an X-Senfile path for that in apache, consider using x-sendfile to send epub files to epub.js instead of usingrender