miku/metha

`-format` not respected?

sdruskat opened this issue · 2 comments

Thanks for supplying this great tool!

When trying to harvest ArXiv using metha-sync http://export.arxiv.org/oai2 -format "arXiv" -rm, I'm getting a log of harvest: &{BaseURL:http://export.arxiv.org/oai2 Format:oai_dc Set: From: Until: Client:<RETRACTED> MaxRequests:1048576 DisableSelectiveHarvesting:false CleanBeforeDecode:true IgnoreHTTPErrors:false MaxEmptyResponses:10 SuppressFormatParameter:false HourlyInterval:false DailyInterval:false ExtraHeaders:map[] KeepTemporaryFiles:false Delay:0 Identify:<RETRACTED> Started:0001-01-01 00:00:00 +0000 UTC Mutex:{state:0 sema:0}} (note the Format:oai_dc bit).

The tmp files also include the oai_dc-formatted XML: <Response><responseDate>2024-01-08T11:29:07Z</responseDate><request verb="ListRecords" set="" metadataPrefix="oai_dc">....

Am I calling metha-sync the wrong way, or is this a bug?

miku commented

Thanks for the bug report and glad you find metha useful.

What you encountered is a limitation of the default flag, that is, all options must come before any argument:

Flag parsing stops just before the first non-flag argument ("-" is a non-flag argument) or after the terminator "--".

This should be better:

$ metha-sync -format "arXiv" -rm http://export.arxiv.org/oai2

Thanks indeed for the clarification, @miku, works like a spell now.