princenyeche/jiraone

Jiraone delta extraction

juliariza opened this issue · 2 comments

Hello!

I have been using the jiraone python module to extract the historic information of issues.
Example code from docs:

from jiraone import LOGIN, PROJECT

user = "email"
password = "token"
link = https://yourinstance.atlassian.net/
LOGIN(user=user, password=password, url=link)

if name == 'main':

jql = "project in (PYT) ORDER BY Rank DESC"  
PROJECT.change_log(jql=jql)

Its great except everytime I run it, it extracts from the beginning which takes a long time. I was wondering if it was possible to extract just the delta/updates not the whole of the information.

Thanks!

Hi @juliariza
The change_log method doesn't do that but with JQL, you can just get only issues that have been updated by altering the JQL to search those issues. Although to your point, it will extract everything. I think it is time for me to update that method to allow multiple processing to append to the same document.

Hey @juliariza

About your initial ask, I think it's doable but there are 3 problems to solve. There has to be some storage of each extraction to know

  • The last time a specific issue key was updated and compare it with what's on the Jira environment
  • What history item was last updated per issue key and if new items exist anytime it is checked (this is how to know the delta)
  • The filename that was used to store the history data and at what point is this new insertion going to start within the saved file when a new history item is found.

While I like the challenge of creating such a feature, I don't think I would be doing that anytime soon. However, with the new version 0.7.9, you can make a very fast asynchronous request for history extraction reducing the long waiting time.
For example: minimal style

# import statement
PROJECT.async_change_log(
               jql, folder="TEST", file="sample.csv"
            )

If you need the extraction to be faster, you can increase the workers for running simultaneous extraction requests as the default is 4.
For example: comprehensive style

# import statement
PROJECT.async_change_log(
               jql, folder="TEST", file="sample.csv", workers=20, flush=10
            )

How it works

Let's say your JQL has 100 issues returned with the search. What the above code does, is take 20 issue keys out of that list and run the request at the same time. It does it 4 more times in batches of 20 requests at a time rather than the normal 1 request at a time that the change_log method provides making it faster to extract data. You can increase the number of workers to 50 or even 100. However, it is recommended to leave it at an acceptable number that wouldn't take too much CPU resource or even make too many requests to your Jira environment. The flush argument causes a delay in seconds just to allow any final asynchronous request that might be running before the file is written to disk.

I believe this will help with the performance improvement if you're extracting more data frequently.