/pigshare

Python cli-client and wrapper library for v2 of the Figshare API

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Pigshare

Python client library and commandline tool for Figshare.

The commandline options are created dynamically from the available api-method wrapper code, which is why some of them might feel a bit clumsy. Also, many of the commands only support values as json-formatted strings. I might change that in the future, but it’d require more complex cli-argparse creation code and I’m not sure whether it’s worth it.

Notes

So far, I’ve only tested this on Linux.

Requirements

  • python-dev package (for simplejson compilation I think)
  • argparse
  • setuptools
  • restkit
  • booby
  • simplejson
  • parinx
  • pyclist
  • argcomplete

Installation

Release

(sudo) pip install pigshare

Development version from Github

(sudo) pip install https://github.com/UoA-eResearch/pigshare/archive/master.zip

Usage (commandline client)

Connection details

Config file location

pigshare reads a config file $HOME/.pigshare.conf, in the format:

[default]
url = https://api.figsh.com/v2
token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

(note, this example uses the staging environment)

Config file format

pigshare supports profiles, so you can have multiple profiles in your config file like for example:

[default]
url = https://api.figsh.com/v2
token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[markus]
url = https://api.figshare.com/v2
token =  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[not_markus_but_somebody_else]
url = https://api.figshare.com/v2
token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

In addition, if you want to use certain statistics features for the institutional version of figshare (documented here: stats_api_doc), you’ll need to add a base64 encoded username:password value under the stats_token key (for the right profile, of course), e.g.:

stats_token = dXNlcm5hbWU6cGFzc3dvcmQK

Profile switching

Now, when you call pigshare with the -p argument, you can switch between different backends/identities:

pigshare -p markus [command]

The command you chose will be using the selected connection parameters.

Features:

Supported

  • creation of articles, via a json string or interactively
  • listing of public and private articles
  • searching for public and private articles
    • updating of articles
  • creation of collections, via a json string or interactively
    • listing of public and private collections
  • searching for public and private collections
  • updating of collections
    • listing of categories and their ids
  • listing of licenses and their ids
    • publishing of articles and collections
    • initial support for getting statistics for articles/collections/authors (not 100% yet)

Not (yet) supported

  • queries with more than 1000 results, only the first 1000 results are displayed
  • automatically deal with the 10 item limits on some methods
  • everything else
  • caching of (article) data

Notes

  • I’ve written this mostly to get data out of figshare, not to put data in, so the functions related to that might need some more work
  • overall, this thing should be still considered somewhat alpha stage, so use with that in mind
  • error handling is pretty much non-existent yet
  • only tested on Linux, so might not work on Mac or Windows
  • I’ve used this only with the institutional version of figshare, I might do some tests with the personal one later on, but haven’t done so, so no idea whether it works.

General usage

Basic usage is displayed via:

pigshare -h

Command specific usage can be displayed via:

pigshare [command] -h

Interactive input

Some of the commands offer interactive input (e.g. create_article, edit_article, create_collection, …). If you choose to use that, you can get help on any particular field by typing ‘?’ as value. Some fields support a more advanced help functionality:

  • categories: ‘?’ lists all available categories along with their internal figshare id (which you need to provide as input), ‘? [search_term]’ lets you filter this list with the provided search term
  • authors: ‘?’ lists all authors and their internal ids (always use the latter if you know it) that pigshare knows about (authors that came up in past queries, so this is not a comprehensible list, if you can’t find the author you want, try to find it via the web-interface)
  • licenses: ‘?’ lists all licenses and their id, ‘? search_term’ filters the result
  • defined_type: ‘?’ lists the available and valid article types

Some fields support multiple values (list input). If that’s the case, pigshare will tell you about it, and let you input the single items one after another. Once you are finished, just press ‘enter’ on an empty field.

Filtering of output fields

(Sub-)commands that display one or more items can be called using an output filter (the -o argument before the sub-command). Depending on the sub-command called only certain fields of the items are available (e.g. list_articles has only a subset of fields compared to read_article).

I’d recommend trying out the command you want to run first, and checking which fields are available, then run the command again with the appropriate filter. A command to list all articles and only display the doi and title of each article would be:

pigshare -o doi,title list_articles

For more advanced filtering, consider piping in the ‘full’ output of pigshare into a tool like jq ( https://stedolan.github.io/jq/ ).

Commonly used commands

Articles

List articles

To list all articles and display the doi, (internal) id, title, url, and published_date for each, issue:

pigshare list_articles

To display a table with all articles, but only display doi and title, you can use:

pigshare -o doi,title list_articles
Read an article

To display the properties of an article, use:

pigshare read_article [article_id]

To display the doi and all tags of a number of articles, use (tags are not part of the ‘short’ article format that the list_articles command returns):

pigshare -o doi,tags read_article [article_id] [article_id] [article_id]
Search for articles

To list all articles matching a search string, issue:

pigshare search_articles --search_term [search_term]

To display all dois and titles of articles that match a search string:

pigshare -o doi,title search_articles --search_term [search_term]
List my articles

To list all of your own articles:

pigshare list_my_articles
To create a new article
pigshare create_article --article '{"title": "Markus test", "custom_fields": {"key1": "value"}}'

Or, if you want pigshare to ask your input for every one of the fields:

pigshare create_article
Upload one (or more files) for an article
pigshare upload_new_file --id [article_id] file1 [file2 ... ...]

Collections

Very similar to articles.

Statistics

Statistics can be queried as totals, timeline, or breakdown. Documentation can be found here: stats_api_doc

Pigshare follows the api methods pretty closely, so you should be able to figure out how it works yourself fairly easily.

An example call to get the total number of views for an article (that was published in an institutional figshare, omit the -i parameter if that was not the case):

pigshare -i auckland get_total_article_views 2075410
{
    "2075410": {
    "totals": 204
  }
}

Total views for an author:

pigshare get_total_author_views 117523
{
    "1175235": {
    "totals": 481
  }
}

Breakdown of downloads for an institutional article, by day:

pigshare -i auckland get_breakdown_article_downloads --granularity day 2075410
{
  "2075410": {
    "breakdown": {
      "2016-05-06": {
        "United States": {
          "Mountain View": 1,
          "total": 1
        }
      },
      "2016-05-10": {
        "New Zealand": {
          "Auckland": 1,
          "total": 1
        }
      },
      "2016-05-16": {
        "New Zealand": {
          "Auckland": 1,
          "total": 1
        }
      }
    }
  }
}

A pipeline to get the total downloads of all your own articles:

 pigshare -o id list_my_articles | xargs pigshare get_total_article_downloads

And so on.

Workflows

Reorder articles in collections

Because of how Figshare works (collections are sorted by the order they were added to the collection), the easiest way to change the order of articles within a collection is to remove all articles from a collection, then add them in the right order, and re-publish the collection again.

So, if you want to order the articles alphabetically for example, you could do it this way:

  • first, find the list of article ids
    $ pigshare -o title,id search_my_articles --search_term ISSP
    ISSP1991: Religion I	2000910
    ISSP1992: Social Inequality II	2000913
    ISSP1993: Environment I	2000916
    ISSP1994: Family and Changing Gender Roles II	2000919
    ISSP1995: National Identity I	2000922
    ISSP1996: Role of Government III	2000925
    ISSP1997: Work Orientations II	2000928
    ISSP1998: Religion II	2000934
    ISSP1999: Social Inequality III	2000937
    ISSP2000: Environment II	2000940
    ISSP2001: Social Networks II	2000943
    ISSP2002: Family and Changing Gender Roles III	2000946
    ISSP2003: National Identity II	2000949
    ISSP2004: Citizenship I	2000952
    ISSP2005: Work Orientations III	2000955
    ISSP2006: Role of Government IV	2000958
    ISSP2007: Leisure Time and Sports I	2000961
    ISSP2008: Religion III	2000964
    ISSP2009: Social Inequality IV	2000967
    ISSP2010: Environment III	2000970
        
  • then, remove and re-add all articles (at the moment, adding more than 10 elements doesn’t work, so you’ll have to do that in batches)
    pigshare remove_article --collection 2118 2000970 2000967 2000964 2000961 2000958 2000955 2000952 2000949 2000946 2000943 2000940 2000937 2000934 2000928 2000925 2000922 2000919 2000916 2000913 2000910
    for id in 2000910 2000913 2000916 2000919 2000922 2000925 2000928 2000934 2000937 2000940 2000943 2000946 2000949 2000952 2000955 2000958 2000961 2000964 2000967 2000970 2001483; do pigshare add_article --id 2118 "$id"; done
        
  • the, publish the collection
    pigshare publish_collection 2118
        
  • check the webfrontend whether it worked by refreshing the collections page

Be aware that if an article got a new version since it was added to a collection, the old version of the article is included in it. If you want the new version, you need to manually remove and re-add the article before you do anything else.

Other random example calls:

# create new collection
pigshare create_collection --collection '{"title": "Collection markus test", "articles": [2009074,2009075,2009084], "custom_fields": {"test1": "value1"}}'
# add articles to a collection
pigshare add_article --id 2761 --article_ids [2009103,2009106]
# search all my articles that contain a search_term, display only ids, separated by ',' (useful to copy and paste into 'add_article' command)
pigshare -o id -s ',' search_my_articles --search_term [search_term]
# list all of your personal articles, and add all of them to a collection
for id in `pigshare -o id list_my_articles`; do echo "$id"; pigshare add_article --collection_id 3222 --article_id "$id"; done
# update/overwrite the title and articles connected to a collection
pigshare update_collection --id 2761 --collection '{"title": "Collection markus test changed", "articles": [2009074,2009075]}'
# update/overwrite the categories field in a collection
pigshare update_article --id 2000077 --article '{"categories": [2]}'
# update/overwrite the custom_fields of a collection
pigshare update_article --id 2000077 --article '{"custom_fields": {"field1":"value1"}}'
# find all article ids of an institution
pigshare -o id list_instiaution_articles --inst_id 12

Usage (Library)

Create your python project, e.g. using cookiecutter:

cookiecutter https://github.com/audreyr/cookiecutter-pypackage

Create a virtualenv:

cd <project_dir>
mkvirtualenv pigshare_example

Edit setup.py to include pigshare requirement:

requirements = [
    "pigshare"
]

Setup dev environment:

python setup.py develop

Write your code (depending on which methods you intend to use you’ll have to include auth token or not), e.g.:

# -*- coding: utf-8 -*-
from pigshare.api import figshare_api
api = figshare_api(url="https://api.figshare.com/v2", token="xxx")
result = api.call_list_my_articles()
print result

Run your code

python pigshare_example/pigshare_example.py