rafguns/doaj-history

Extend to APC prices

eschares opened this issue · 6 comments

This is really cool. I've had a similar idea to track APC prices at the publisher site, and this looks like it would get 90% of the way there.

Elsevier: https://www.elsevier.com/books-and-journals/journal-pricing/apc-pricelist

Hi Eric, that's a cool idea! The DOAJ data include fields 'APC', 'APC information URL', and 'APC amount'. Do you think that's sufficient to track APC prices? I haven't really done any analysis of this yet so I don't know if the APC prices in DOAJ are correct and up-to-date. Getting them directy from the publisher would certainly be the best guarantee of that, but that might pose its own problems (changes to the location or format of where those prices can be consulted, the number of publishers, etc.).

Another dataset that may be of interest in this regard is https://github.com/openapc/openapc-de.

Thanks, Raf! DOAJ would have only have fully OA journals, correct? I am interested in "hybrid" journals as well, especially since those APCs tend to be higher than fully OA.

OpenAPC is a laudable project, but it relies on people self-reporting what they paid. As such, I find it to have lots of gaps and it will always be a little behind and out-of-date. I do think getting the file directly from the publisher would be the best way.

Am I correct in reading that all the action happens in the YAML file? I don't have much experience working with this type of file. It would also be nice to display the diff results more cleanly, either tracking the adds/deletes or sending an automatic notify that "Journal XYZ just increased their APC by $100 to $3,500."

I get that and I think it would indeed be very worthwhile. However, I'd suggest that this would be better as a separate project/repo.

The action happens in the YAML file, yes: Github Actions are pretty powerful for this kind of thing. While in the DOAJ case, I could just use curl, we would probably need some kind of script per publisher to fetch the data and process it a bit. But once you have that, automating it with GH Actions is not a very big step.

You're right that the diff results for this are not super useful at the moment. I actually plan to write something to process the git commits and track certain changes. I just haven't gotten around to it... yet.

Anyway, I'll look into the APC scraping, taking the Elsevier list as a starting point. Would be cool if we could work on that together!

Yes, I agree it would be better to become a separate project. Would love to work on it with you!

Hi, I've started a very early attempt of this over at https://github.com/eschares/APC_tracker. So far I can download the file, append filename with the date, move to a files directory, and handle the assignment of today's and yesterday's files to the appropriate variable. Still much more to be done, but if you have more straightforward ways of doing this I thought I would check in.

Oh, that's really cool! I'll take a look there and see if I have any ideas.