[1.7.0] CircleCI API 404 response kills the job
davet1985 opened this issue ยท 16 comments
Orb version
1.7.0
What happened
The job exited unexpectedly upon receiving a 404 from the CircleCI API when making a call to get a workflow.
Full job logs
#!/bin/bash -eo pipefail
tag_pattern=""
# If a pattern is wrapped with slashes, remove them.
if [[ "$tag_pattern" == /*/ ]]; then
tag_pattern=${tag_pattern:1:-1}
fi
fetch(){
echo "DEBUG: Making API Call to ${1}"
url=$1
target=$2
http_response=$(curl -f -s -X GET -H "Circle-Token:${CIRCLECI_API_KEY}" -o "${target}" -w "%{http_code}" "${url}")
if [ $http_response != "200" ]; then
echo "ERROR: Server returned error code: $http_response"
cat ${target}
exit 1
else
echo "DEBUG: API Success"
fi
}
load_variables(){
# just confirm our required variables are present
: ${CIRCLE_BUILD_NUM:?"Required Env Variable not found!"}
: ${CIRCLE_PROJECT_USERNAME:?"Required Env Variable not found!"}
: ${CIRCLE_PROJECT_REPONAME:?"Required Env Variable not found!"}
: ${CIRCLE_REPOSITORY_URL:?"Required Env Variable not found!"}
: ${CIRCLE_JOB:?"Required Env Variable not found!"}
# Only needed for private projects
if [ -z "${CIRCLECI_API_KEY}" ]; then
echo "CIRCLECI_API_KEY not set. Private projects will be inaccessible."
else
fetch "https://circleci.com/api/v2/me" "/tmp/me.cci"
me=$(jq -e '.id' /tmp/me.cci)
echo "Using API key for user: ${me}"
fi
VCS_TYPE="github"
}
fetch_filtered_active_builds(){
if [ "false" != "true" ];then
echo "Orb parameter 'consider-branch' is false, will block previous builds on any branch."
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
elif [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
# I'm not sure why this is here, seems identical to above?
echo "CIRCLE_TAG and orb parameter tag-pattern is set, fetch active builds"
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
else
: ${CIRCLE_BRANCH:?"Required Env Variable not found!"}
echo "Only blocking execution if running previous jobs on branch: ${CIRCLE_BRANCH}"
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/tree/${CIRCLE_BRANCH}?filter=running"
fi
if [ ! -z $TESTING_MOCK_RESPONSE ] && [ -f $TESTING_MOCK_RESPONSE ];then
echo "Using test mock response"
cat $TESTING_MOCK_RESPONSE > /tmp/jobstatus.json
else
echo "Attempting to access CircleCI api. If the build process fails after this step, ensure your CIRCLECI_API_KEY is set."
fetch "$jobs_api_url_template" "/tmp/jobstatus.json"
if [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
jq "[ .[] | select((.build_num | . == \"${CIRCLE_BUILD_NUM}\") or (.vcs_tag | (. != null and test(\"${tag_pattern}\"))) ) ]" /tmp/jobstatus.json >/tmp/jobstatus_tag.json
mv /tmp/jobstatus_tag.json /tmp/jobstatus.json
fi
echo "API access successful"
fi
}
fetch_active_workflows(){
cp /tmp/jobstatus.json /tmp/augmented_jobstatus.json
for workflow in `jq -r ".[] | .workflows.workflow_id //empty" /tmp/augmented_jobstatus.json | uniq`
do
echo "Checking time of workflow: ${workflow}"
workflow_file=/tmp/workflow-${workflow}.json
if [ ! -z $TESTING_MOCK_WORKFLOW_RESPONSES ] && [ -f $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json ]; then
echo "Using test mock workflow response"
cat $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json > ${workflow_file}
else
fetch "https://circleci.com/api/v2/workflow/${workflow}" "${workflow_file}"
fi
created_at=`jq -r '.created_at' ${workflow_file}`
echo "Workflow was created at: ${created_at}"
cat /tmp/augmented_jobstatus.json | jq --arg created_at "${created_at}" --arg workflow "${workflow}" '(.[] | select(.workflows.workflow_id == $workflow) | .workflows) |= . + {created_at:$created_at}' > /tmp/augmented_jobstatus-${workflow}.json
#DEBUG echo "new augmented_jobstatus:"
#DEBUG cat /tmp/augmented_jobstatus-${workflow}.json
mv /tmp/augmented_jobstatus-${workflow}.json /tmp/augmented_jobstatus.json
done
}
update_comparables(){
fetch_filtered_active_builds
fetch_active_workflows
load_current_workflow_values
JOB_NAME="${CIRCLE_JOB}"
if [ "^validate-controllers$" ] ;then
JOB_NAME="^validate-controllers$"
fi
# falsey parameters are empty strings, so always compare against 'true'
if [ "false" = "true" ] ;then
echo "Orb parameter block-workflow is true."
echo "This job will block until no previous workflows have *any* jobs running."
oldest_running_build_num=`jq 'sort_by(.workflows.created_at)| .[0].build_num' /tmp/augmented_jobstatus.json`
oldest_commit_time=`jq 'sort_by(.workflows.created_at)| .[0].workflows.created_at' /tmp/augmented_jobstatus.json`
else
echo "Orb parameter block-workflow is false."
echo "Only blocking execution if running previous jobs matching this job: ${JOB_NAME}"
oldest_running_build_num=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)| .[0].build_num" /tmp/augmented_jobstatus.json`
oldest_commit_time=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)| .[0].workflows.created_at" /tmp/augmented_jobstatus.json`
fi
if [ -z "$oldest_commit_time" ]; then
echo "API Error - unable to load previous job timings. Report to developer."
exit 1
fi
echo "Oldest job: $oldest_running_build_num"
if [ -z $oldest_commit_time ];then
echo "API Call for existing jobs failed, failing this build. Please check API token"
echo "All running jobs:"
cat /tmp/jobstatus.json || exit 0
echo "All running jobs with created_at:"
cat /tmp/augmented_jobstatus.json || exit 0
echo "All worfklow details."
cat /tmp/workflow-*.json
exit 1
fi
}
load_current_workflow_values(){
my_commit_time=`jq '.[] | select( .build_num == '"${CIRCLE_BUILD_NUM}"').workflows.created_at' /tmp/augmented_jobstatus.json`
}
cancel_current_build(){
echo "Cancelleing build ${CIRCLE_BUILD_NUM}"
cancel_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/${CIRCLE_BUILD_NUM}/cancel?circle-token=${CIRCLECI_API_KEY}"
curl -s -X POST $cancel_api_url_template > /dev/null
}
#
# We can skip a few use cases without calling API
#
if [ ! -z "$CIRCLE_PR_REPONAME" ]; then
echo "Queueing on forks is not supported. Skipping queue..."
# It's important that we not fail here because it could cause issues on the main repo's branch
exit 0
fi
if [ "*" = "*" ] || [ "*" = "${CIRCLE_BRANCH}" ]; then
echo "${CIRCLE_BRANCH} queueable"
else
echo "Queueing only happens on * branch, skipping queue"
exit 0
fi
#
# Set values that wont change while we wait
#
load_variables
max_time=20
echo "This build will block until all previous builds complete."
echo "Max Queue Time: ${max_time} minutes."
wait_time=0
loop_time=11
max_time_seconds=$((max_time * 60))
#
# Queue Loop
#
confidence=0
while true; do
update_comparables
echo "This Workflow Timestamp: $my_commit_time"
echo "Oldest Workflow Timestamp: $oldest_commit_time"
if [[ ! -z "$my_commit_time" ]] && [[ "$oldest_commit_time" > "$my_commit_time" || "$oldest_commit_time" = "$my_commit_time" ]] ; then
# API returns Y-M-D HH:MM (with 24 hour clock) so alphabetical string compare is accurate to timestamp compare as well
# recent-jobs API does not include pending, so it is posisble we queried in between a workfow transition, and we;re NOT really front of line.
if [ $confidence -lt 1 ];then
# To grow confidence, we check again with a delay.
confidence=$((confidence+1))
echo "API shows no previous jobs/workflows, but it is possible a previous workflow has pending jobs not yet visible in API."
echo "Rerunning check ${confidence}/1"
else
echo "Front of the line, WooHoo!, Build continuing"
break
fi
else
# If we fail, reset confidence
confidence=0
echo "This build (${CIRCLE_BUILD_NUM}) is queued, waiting for build number (${oldest_running_build_num}) to complete."
echo "Total Queue time: ${wait_time} seconds."
fi
if [ $wait_time -ge $max_time_seconds ]; then
echo "Max wait time exceeded, considering response."
if [ "false" == "true" ];then
echo "Orb parameter dont-quit is set to true, letting this job proceed!"
exit 0
else
cancel_current_build
sleep 10 # wait for API to cancel this job, rather than showing as failure
exit 1 # but just in case, fail job
fi
fi
sleep $loop_time
wait_time=$(( loop_time + wait_time ))
done
wf-963-cluster-autoscaler-tag-fix queueable
DEBUG: Making API Call to https://circleci.com/api/v2/me
DEBUG: API Success
Using API key for user: "2eddcc82-ce3a-478e-bf5c-a2f9fe456784"
This build will block until all previous builds complete.
Max Queue Time: 20 minutes.
Orb parameter 'consider-branch' is false, will block previous builds on any branch.
Attempting to access CircleCI api. If the build process fails after this step, ensure your CIRCLECI_API_KEY is set.
DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/appvia/wayfinder?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: 6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: API Success
Workflow was created at: 2022-04-27T11:39:52Z
Checking time of workflow: 5e49bff4-b61c-48f4-8fd2-8ec7da55769d
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/5e49bff4-b61c-48f4-8fd2-8ec7da55769d
DEBUG: API Success
Workflow was created at: 2022-04-27T11:40:24Z
Checking time of workflow: 6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6fe3a27b-8e02-4cdb-845d-1de09c2cced1
DEBUG: API Success
Workflow was created at: 2022-04-27T11:39:52Z
Checking time of workflow: bf767bb5-f023-45c9-9432-29e8ac9088c6
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/bf767bb5-f023-45c9-9432-29e8ac9088c6
DEBUG: API Success
Workflow was created at: 2022-04-27T11:33:35Z
Checking time of workflow: 63242a50-754b-4d33-9c80-c7cc979aa6d3
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/63242a50-754b-4d33-9c80-c7cc979aa6d3
DEBUG: API Success
Workflow was created at: 2022-04-27T11:10:46Z
Checking time of workflow: 013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: API Success
Workflow was created at: 2022-04-21T19:30:12Z
Checking time of workflow: f7cc0a81-2c84-46a2-8a0c-3d3fbd6cd87b
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/f7cc0a81-2c84-46a2-8a0c-3d3fbd6cd87b
Exited with code exit status 22
CircleCI received exit code 22
Expected behavior
If the workflow is not found, it should be ignored and the process should continue onto the next workflow to check.
I have the same problem
Same problem here...
If you're looking for a workaround, you can pull the script directly into your CircleCI config.yml
, with the updates made in #80
For example:
job-requiring-queue:
# add all the parameters and set the defaults as required
parameters:
consider-branch:
type: boolean
default: false
description: "Should we only consider jobs running on the same branch?"
block-workflow:
type: boolean
# this is false at COMMAND level as intention is to only block CURRENT job.
default: false
description: "If true, this job will block until no other workflows with an earlier timestamp are running. Typically used as first job."
time:
type: string
default: "20"
description: "How many minutes to wait before giving up."
dont-quit:
type: boolean
default: false
description: "Quitting is for losers. Force job through once time expires instead of failing."
only-on-branch:
type: string
default: "*"
description: "Only queue on specified branch"
vcs-type:
type: string
default: "github"
description: "Override VCS to 'bitbucket' if needed."
confidence:
type: string
default: "1"
description: "Due to scarce API, we need to requery the recent jobs list to ensure we're not just in a pending state for previous jobs. This number indicates the threhold for API returning no previous pending jobs. Default is a single confirmation."
circleci-api-key:
type: env_var_name
default: CIRCLECI_API_KEY
description: "In case you use a different Environment Variable Name than CIRCLECI_API_KEY, supply it here."
tag-pattern:
type: string
default: ""
description: "Set to queue jobs using a regex pattern f.ex '^v[0-9]+\\.[0-9]+\\.[0-9]+$' to filter CIRCLECI_TAG"
job-regex:
type: string
default: ""
description: "Allow multiple job names to be blocked until front of line f.ex '^runTests*'"
steps:
- checkout
# run block including the modified queue script
- run:
name: Queue Until Front of Line
command: |
tag_pattern="<<parameters.tag-pattern>>"
# If a pattern is wrapped with slashes, remove them.
if [[ "$tag_pattern" == /*/ ]]; then
tag_pattern=${tag_pattern:1:-1}
fi
fetch(){
echo "DEBUG: Making API Call to ${1}"
url=$1
target=$2
http_response=$(curl -s -X GET -H "Circle-Token:${<< parameters.circleci-api-key >>}" -o "${target}" -w "%{http_code}" "${url}")
if [ $http_response == "404" ]; then
echo "DEBUG: API Not found"
else
if [ $http_response != "200" ]; then
echo "ERROR: Server returned error code: $http_response"
cat ${target}
exit 1
else
echo "DEBUG: API Success"
fi
fi
}
load_variables(){
# just confirm our required variables are present
: ${CIRCLE_BUILD_NUM:?"Required Env Variable not found!"}
: ${CIRCLE_PROJECT_USERNAME:?"Required Env Variable not found!"}
: ${CIRCLE_PROJECT_REPONAME:?"Required Env Variable not found!"}
: ${CIRCLE_REPOSITORY_URL:?"Required Env Variable not found!"}
: ${CIRCLE_JOB:?"Required Env Variable not found!"}
# Only needed for private projects
if [ -z "${<< parameters.circleci-api-key >>}" ]; then
echo "<< parameters.circleci-api-key >> not set. Private projects will be inaccessible."
else
fetch "https://circleci.com/api/v2/me" "/tmp/me.cci"
me=$(jq -e '.id' /tmp/me.cci)
echo "Using API key for user: ${me}"
fi
VCS_TYPE="<<parameters.vcs-type>>"
}
fetch_filtered_active_builds(){
if [ "<<parameters.consider-branch>>" != "true" ];then
echo "Orb parameter 'consider-branch' is false, will block previous builds on any branch."
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
elif [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
# I'm not sure why this is here, seems identical to above?
echo "CIRCLE_TAG and orb parameter tag-pattern is set, fetch active builds"
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}?filter=running"
else
: ${CIRCLE_BRANCH:?"Required Env Variable not found!"}
echo "Only blocking execution if running previous jobs on branch: ${CIRCLE_BRANCH}"
jobs_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/tree/${CIRCLE_BRANCH}?filter=running"
fi
if [ ! -z $TESTING_MOCK_RESPONSE ] && [ -f $TESTING_MOCK_RESPONSE ];then
echo "Using test mock response"
cat $TESTING_MOCK_RESPONSE > /tmp/jobstatus.json
else
echo "Attempting to access CircleCI api. If the build process fails after this step, ensure your << parameters.circleci-api-key >> is set."
fetch "$jobs_api_url_template" "/tmp/jobstatus.json"
if [ -n "${CIRCLE_TAG:x}" ] && [ "$tag_pattern" != "" ]; then
jq "[ .[] | select((.build_num | . == \"${CIRCLE_BUILD_NUM}\") or (.vcs_tag | (. != null and test(\"${tag_pattern}\"))) ) ]" /tmp/jobstatus.json >/tmp/jobstatus_tag.json
mv /tmp/jobstatus_tag.json /tmp/jobstatus.json
fi
echo "API access successful"
fi
}
fetch_active_workflows(){
cp /tmp/jobstatus.json /tmp/augmented_jobstatus.json
for workflow in `jq -r ".[] | .workflows.workflow_id //empty" /tmp/augmented_jobstatus.json | uniq`
do
echo "Checking time of workflow: ${workflow}"
workflow_file=/tmp/workflow-${workflow}.json
if [ ! -z $TESTING_MOCK_WORKFLOW_RESPONSES ] && [ -f $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json ]; then
echo "Using test mock workflow response"
cat $TESTING_MOCK_WORKFLOW_RESPONSES/${workflow}.json > ${workflow_file}
else
fetch "https://circleci.com/api/v2/workflow/${workflow}" "${workflow_file}"
fi
created_at=`jq -r '.created_at' ${workflow_file}`
if [ $created_at != "null" ]; then
echo "Workflow was created at: ${created_at}"
cat /tmp/augmented_jobstatus.json | jq --arg created_at "${created_at}" --arg workflow "${workflow}" '(.[] | select(.workflows.workflow_id == $workflow) | .workflows) |= . + {created_at:$created_at}' > /tmp/augmented_jobstatus-${workflow}.json
#DEBUG echo "new augmented_jobstatus:"
#DEBUG cat /tmp/augmented_jobstatus-${workflow}.json
mv /tmp/augmented_jobstatus-${workflow}.json /tmp/augmented_jobstatus.json
else
echo "Workflow not found: ${workflow}"
fi
done
}
update_comparables(){
fetch_filtered_active_builds
fetch_active_workflows
load_current_workflow_values
JOB_NAME="${CIRCLE_JOB}"
if [ "<<parameters.job-regex>>" ] ;then
JOB_NAME="<<parameters.job-regex>>"
fi
# falsey parameters are empty strings, so always compare against 'true'
if [ "<<parameters.block-workflow>>" = "true" ] ;then
echo "Orb parameter block-workflow is true."
echo "This job will block until no previous workflows have *any* jobs running."
oldest_running_build_num=`jq 'sort_by(.workflows.created_at)| .[0].build_num' /tmp/augmented_jobstatus.json`
oldest_commit_time=`jq 'sort_by(.workflows.created_at)| .[0].workflows.created_at' /tmp/augmented_jobstatus.json`
else
echo "Orb parameter block-workflow is false."
echo "Only blocking execution if running previous jobs matching this job: ${JOB_NAME}"
oldest_running_build_num=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)| .[0].build_num" /tmp/augmented_jobstatus.json`
oldest_commit_time=`jq ". | map(select(.workflows.job_name | test(\"${JOB_NAME}\";\"sx\"))) | sort_by(.workflows.created_at)| .[0].workflows.created_at" /tmp/augmented_jobstatus.json`
fi
if [ -z "$oldest_commit_time" ]; then
echo "API Error - unable to load previous job timings. Report to developer."
exit 1
fi
echo "Oldest job: $oldest_running_build_num"
if [ -z $oldest_commit_time ];then
echo "API Call for existing jobs failed, failing this build. Please check API token"
echo "All running jobs:"
cat /tmp/jobstatus.json || exit 0
echo "All running jobs with created_at:"
cat /tmp/augmented_jobstatus.json || exit 0
echo "All worfklow details."
cat /tmp/workflow-*.json
exit 1
fi
}
load_current_workflow_values(){
my_commit_time=`jq '.[] | select( .build_num == '"${CIRCLE_BUILD_NUM}"').workflows.created_at' /tmp/augmented_jobstatus.json`
}
cancel_current_build(){
echo "Cancelleing build ${CIRCLE_BUILD_NUM}"
cancel_api_url_template="https://circleci.com/api/v1.1/project/${VCS_TYPE}/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/${CIRCLE_BUILD_NUM}/cancel?circle-token=${<< parameters.circleci-api-key >>}"
curl -s -X POST $cancel_api_url_template > /dev/null
}
#
# We can skip a few use cases without calling API
#
if [ ! -z "$CIRCLE_PR_REPONAME" ]; then
echo "Queueing on forks is not supported. Skipping queue..."
# It's important that we not fail here because it could cause issues on the main repo's branch
exit 0
fi
if [ "<<parameters.only-on-branch>>" = "*" ] || [ "<<parameters.only-on-branch>>" = "${CIRCLE_BRANCH}" ]; then
echo "${CIRCLE_BRANCH} queueable"
else
echo "Queueing only happens on <<parameters.only-on-branch>> branch, skipping queue"
exit 0
fi
#
# Set values that wont change while we wait
#
load_variables
max_time=<<parameters.time>>
echo "This build will block until all previous builds complete."
echo "Max Queue Time: ${max_time} minutes."
wait_time=0
loop_time=11
max_time_seconds=$((max_time * 60))
#
# Queue Loop
#
confidence=0
while true; do
update_comparables
echo "This Workflow Timestamp: $my_commit_time"
echo "Oldest Workflow Timestamp: $oldest_commit_time"
if [[ ! -z "$my_commit_time" ]] && [[ "$oldest_commit_time" > "$my_commit_time" || "$oldest_commit_time" = "$my_commit_time" ]] ; then
# API returns Y-M-D HH:MM (with 24 hour clock) so alphabetical string compare is accurate to timestamp compare as well
# recent-jobs API does not include pending, so it is posisble we queried in between a workfow transition, and we;re NOT really front of line.
if [ $confidence -lt <<parameters.confidence>> ];then
# To grow confidence, we check again with a delay.
confidence=$((confidence+1))
echo "API shows no previous jobs/workflows, but it is possible a previous workflow has pending jobs not yet visible in API."
echo "Rerunning check ${confidence}/<<parameters.confidence>>"
else
echo "Front of the line, WooHoo!, Build continuing"
break
fi
else
# If we fail, reset confidence
confidence=0
echo "This build (${CIRCLE_BUILD_NUM}) is queued, waiting for build number (${oldest_running_build_num}) to complete."
echo "Total Queue time: ${wait_time} seconds."
fi
if [ $wait_time -ge $max_time_seconds ]; then
echo "Max wait time exceeded, considering response."
if [ "<<parameters.dont-quit>>" == "true" ];then
echo "Orb parameter dont-quit is set to true, letting this job proceed!"
exit 0
else
cancel_current_build
sleep 10 # wait for API to cancel this job, rather than showing as failure
exit 1 # but just in case, fail job
fi
fi
sleep $loop_time
wait_time=$(( loop_time + wait_time ))
done
- run:
name: The job to do next
@davet1985 thanks for report and PR.
I'm not clear though why a workflow declared by a running job would not exist when queried.
And if it doesn't, it indicates faulty data to make a decision on.
Do you have any insight on the underlying cause?
Hey @eddiewebb
I'm not clear though why a workflow declared by a running job would not exist when queried.
This is what I've been trying to figure out since two days ago.
From my side, that problem started on 04/26, when CircleCI apparently released a new update: https://circleci.com/changelog/#updated-cli-commands-for-private-orbs
Don't know if it can help.
@davet1985 thanks for report and PR.
I'm not clear though why a workflow declared by a running job would not exist when queried.
And if it doesn't, it indicates faulty data to make a decision on.
Do you have any insight on the underlying cause?
Hi @eddiewebb, I totally agree, it's very strange behaviour from CircleCI and no I don't have any insight on why it's happening, but it certainly seems to be affecting multiple people. Possibly a bug in a new release of CircleCI's API?! I am seeing the same issue when I call the API manually.
The change to the code makes it slightly more defensive when a 404 response is found.
We've also experienced this issue, and in my research I found a very old job from 2019 stuck in the running
state, which was not visible in the UI and any attempt to directly access it's build_url
or workflow resulted in 404. I guess something has changed on the CircleCI side, maybe they just cleared up old data.
Anyway, I was able to cancel this old job by calling the API directly and unblock our deployment pipeline.
First, find out if there're some old jobs stuck in running state on your deploy branch and make note of their build_num
:
curl -H 'Circle-Token: <your token>' 'https://circleci.com/api/v1.1/project/github/<org>/<repo>/tree/<branch>?filter=running' | | jq -c 'map(.queued_at, .build_num)'
and cancel those builds via Cancel a build API:
curl -H 'Circle-Token: <your token>' -X POST 'https://circleci.com/api/v1.1/project/github/<org>/<repo>/<build_num>/cancel'
@nebolsin same here, had a stuck job from November 2019, appreciate the command
@nebolsin thanks very much for sharing this, I found two workflows stuck which I have been able to cancel.
@nebolsin thank you! I was just about tp point out the missing workflow seems to be much older -- was AT LEAST older than 1 week
Checking time of workflow: 013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/013b74f9-6aea-4c77-a0c9-5142b78514c4
DEBUG: API Success
Workflow was created at: 2022-04-21T19:30:12Z
And so I suspect we must have a retention period issue on the workflow side. I will raise this internally but it seems like the right "fix" is to address hanging builds.
Would it make sense to still allow the orb to be tolerant of 404, I think only for workflow and perhaps even a trivial date check...
@davet1985 - is your org using the new retention policy controls?
https://circleci.com/docs/2.0/persist-data/#custom-storage-usage
@nebolsin , @asselinpaul , @davet1985 - did any of you happen to know for those stuck jobs if the workflow was in fact missing? I would love a JOB id to debug.
@davet1985 - is your org using the new retention policy controls?
@eddiewebb those all seem to be set to the defaults.
@eddiewebb I agree that the root cause here is CircleCI still reporting running jobs, even though their workflows are no longer accessible. Unfortunately, in my case it's a private project, so I cannot help with a job id.
I checked our last successful deploy before this issue manifested itself was on 2022-04-25T19:23:00Z
and the logs indicate that the stuck job was definitely there, but it's workflow was accessible. It didn't block the deploy at that time because job name was different:
DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/<org>/<repo>/tree/master?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: ca97a6cf-86da-4d81-9de4-1cdf13bd2f8b
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/ca97a6cf-86da-4d81-9de4-1cdf13bd2f8b
DEBUG: API Success
Workflow was created at: 2022-04-25T19:23:00Z
Checking time of workflow: 6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: API Success
Workflow was created at: 2019-12-05T23:53:42Z
Orb parameter block-workflow is false.
Only blocking execution if running previous jobs matching this job: web_deploy
Oldest job: 320754
This Workflow Timestamp: "2022-04-25T19:23:00Z"
Oldest Workflow Timestamp: "2022-04-25T19:23:00Z
And the first problematic deploy was on 2022-04-26T18:05:19Z
:
DEBUG: Making API Call to https://circleci.com/api/v1.1/project/github/<org>/<repo>tree/master?filter=running
DEBUG: API Success
API access successful
Checking time of workflow: dba1bb0f-e0db-4511-87c7-1f7035d6b0fe
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/dba1bb0f-e0db-4511-87c7-1f7035d6b0fe
DEBUG: API Success
Workflow was created at: 2022-04-26T18:05:19Z
Checking time of workflow: 6a5834ae-e451-498b-8d17-e3376893ea5c
DEBUG: Making API Call to https://circleci.com/api/v2/workflow/6a5834ae-e451-498b-8d17-e3376893ea5c
Exited with code exit status 22
CircleCI received exit code 22
You can see that the API call for the exact same workflow now results in an error, and I checked it manually via curl โ it definitely was 404 (in fact, I can still get this old job details both via API v1: Single job and API v2: Get job details endpoints, and I still get 404 when trying to access it's workflow).
Ok, got some results that bring clarity.
- this is related to CircleCI recently released retention policies
- we are temporarily blocking access to data older than 3 months, and will eventually delete this permanently
- there is pending inconsistency between which APIS enforce this or not (workflows does, recent jobs does not)
- future state those pending/old/stale jobs will also be deleted, so data will be consistent
I'm not on the team that control's those decisions, so I am happy to convey feedback but encourage anyone impacted to raise a support ticket or vote on ideas.circleci.com related to it.
I believe changes recently went live restricting data in the recent-builds API that will prevent this specific scenario from occurring.
I am going to close this issue currently unless anybody does see it still, or @nebolsin 's fix (#79 (comment)) does not address the stale data from hung jobs.
Thank you all for the contribution and discussion.
In my case I still had to cancel a workflow that was "stuck in running state" from 2019. After that, all seems fine (for now)