Test and deploy NPLD Wayback with the new back-end configuration
Closed this issue · 15 comments
We need to deploy the LD Wayback player with a new backend configuration. I've removed the old custom configuration, so the build should now inherit the new configuration for the back end. i.e. the current set-up should work.
However, this needs to be tested thoroughly, including the LD-specific functionality. So, here's a suggested plan of work:
- Build this project locally, and check you can deploy
wayback-qa
to a local Tomcat. @GilHoggarth should be able to supply suitable back-end service endpoints to run like QA Wayback. - Then try deploying
wayback-ldwa
locally, this should work exactly likewayback-qa
except if you try to go to the same page in two separate browsers, you should be locked out of one of them. - If this seems to be working, check you can visit the lock management UI. IIRC documentation for that is on our internal Wiki.
- You should also check the 7 day embargo works when configured.
Once this appears to be working, we'll need to merge these changes into the branch for #1.
As @min2ha noticed, the embargo appears to be in milliseconds. See DateEmbargoFilter.
There are many pre-defined values in org.archive.wayback.util.partition.PartitionSize.
Example (org.archive.wayback.util.partition.PartitionSize.MS_IN_YEAR)
http://iipc.github.io/openwayback/2.0.0/apidocs/org/archive/wayback/util/partition/PartitionSize.html
To save time, just use the value according to embargo time needed.
I’ve done output, of most important values :
MS_IN_DAY = 86400000
MS_IN_WEEK = 604800000
MS_IN_MONTH = 2592000000
MS_IN_TWO_MONTH = 5184000000
MS_IN_YEAR = 31536000000
MS_IN_TWO_YEAR = 63072000000
Hm, okay, so I think I know what happened to the embargo. When we set it up, we were using plain CDX files as the back-end, which uses this LocalResourceIndex class. This bakes in a number of standard filters, some of which pick up configuration from the parent AccessGroup.
In particular, it's the AccessPointCaptureFilterGroupFactory which implements the embargo.
However, we've switched to RemoteResourceIndex, which expects that filtering to be done 'upstream' and does very little filtering itself:
So, we need to add the embargo support back in. Given we already use our own SURTFilteringRemoteResourceIndex the simplest thing is probably just to add the embargo code...
long embargoMS = accessPoint.getEmbargoMS();
if(embargoMS > 0) {
chain.addFilter(new DateEmbargoFilter(embargoMS));
}
to our own getSearchResultFilters method.
Implemented in a90732f thanks @min2ha
@GilHoggarth the embargo should work now.
Okay, so there was some real clunky stuff in the locking code. For reasons that made sense at one point, the locking was hard-coded against the behaviour of a particular browser (the version of Firefox that runs in Ericom). This was done by looking for a very specific Accept
header. Other browsers don't send that, and so the locking wasn't being applied.
This rule was always brittle, so I've taken it out (as of a7f52a0). We should be able to observe the locking working now.
I've also attempted to clean up the logging a bit.
In the current deployment of the NPLD wayback service, each of the LDL versions are slightly tailored to the LDL. Examples of this spotted so far are:
- The locking page in the new wayback states "The British Library Legal Deposit Web Archive" effectively as the page footer. To be consistent with the /ukdomain Drupal page footer, this should say:
- dls-{bsp,lon}-wb01 "The British Library Legal Deposit Web Archive"
- dls-{bsp,lon}-wb02 "Cambridge University Library Legal Deposit Web Archive"
- dls-{bsp,lon}-wb03 "Bodleian Library Legal Deposit Web Archive"
- dls-{bsp,lon}-wb04 "Trinity College Dublin Library Legal Deposit Web Archive"
- dls-nls-wb01 "The National Library of Scotland Legal Deposit Web Archive"
- dls-nlw-wb01 "The National Library of Wales Legal Deposit Web Archive"
Plus, wayback-ldwa still has big, wrong exclude.txt in WEB-INF/classes/.
Okay, I think I fixed the exclude.txt
override, and created a new environment variable WEB_ARCHIVE_NAME
that should be set appropriately for each deployment. Needs testing!
Okay, I think this should render the WEB_ARCHIVE_NAME
in the right place now.
Footer and lock now working; embargo still not restricting access. Any date shows that date's content, embargo just seems to not list the existence of the content in the wayback calendar.
Sorry, year Hardcodded in BubbleCalendar.jsp:
Hardcodded in BubbleCalendar.jsp:
waybacks/wayback-ukwa/src/main/webapp/WEB-INF/query/BubbleCalendar.jsp
Lines 257 to 261 in c8cd56d
Yeah, that's a really old version. The one on the newer branch does it right:
waybacks/wayback-ukwa/src/main/webapp/WEB-INF/query/BubbleCalendar.jsp
Lines 271 to 275 in b963711
That said, the BubbleCalendar on the 2017-style-reset branch doesn't really seem to be actually working, I think. Anyway, using BubbleCalendar as part of #1 so we should probably move this discussion there!
Oh lawks. The issue with the calendar view not remembering the year was extremely awkward/subtle. The logic that says which capture is closest to the requested date was disabled fro Remote Resource Indexes (no idea why) and without that the calendar page couldn't tell which was the current year. Testing the fix now.
But that belongs in a different issue! I believe this is deployed and working.