Support request on scaling from zero
t83714 opened this issue ยท 15 comments
If you invoke a function (use of-watchdog HTTP mode) after it's been scaled to zero, a new function pod has been successfully created but get Server returned unexpected status code: 500 -
result.
If you look into the function pod log below:
Forking - node [bootstrap.js]
2020/03/10 11:23:19 Started logging stderr from function.
2020/03/10 11:23:19 Started logging stdout from function.
2020/03/10 11:23:19 OperationalMode: http
2020/03/10 11:23:19 Timeouts: read: 1m5s, write: 1m5s hard: 1m0s.
2020/03/10 11:23:19 Listening on port: 8080
2020/03/10 11:23:19 Writing lock-file to: /tmp/.lock
2020/03/10 11:23:19 Metrics listening on port: 8081
2020/03/10 11:23:21 Upstream HTTP request error: Post http://127.0.0.1:3000/: dial tcp 127.0.0.1:3000: connect: connection refused
2020/03/10 11:23:27 stdout: OpenFaaS Node.js listening on port: 3000
you can tell of-watch
throw an Upstream HTTP request error
6 seconds before the express Node.js server started to listen at port 3000.
Expected Behaviour
When invoking a function that has been scaled to zero
, the function needs to be successfully executed and return the correct response.
Extra response time due to the pod initialisation is acceptable.
Current Behaviour
When invoking a function that has been scaled to zero
, a Server returned unexpected status code: 500 -
response is returned.
Possible Solution
Possible Solution 1:
- Make of-watch re-try the connection to the internal HTTP server (under HTTP mode) during a certain timeframe. During the timeframe, the of-watch should not throw an
Upstream HTTP request error
Possible Solution 2:
- Implement a readiness probe in addition to current liveness Probe (that serve at
/_/health
). Before the internal HTTP server is up, the readiness probe should not return 200 status code.
Steps to Reproduce (for bugs)
- Turn on the
scale to zero
feature offaas-idler
and deploy the function withcom.openfaas.scale.zero=true
label - Wait 30 mins and make sure the function pod is terminated by the
faas-idler
- invoke the function and watch the result
Context
This issue makes scale to zero
feature unusable as the first request after scaled to zero
will always fail.
Your Environment
CLI:
commit: ea687659ecf14931a29be46c4d2866899d36c282
version: 0.11.8
Gateway
uri: http://127.0.0.1:8080
version: 0.18.10
sha: 80b6976c106370a7081b2f8e9099a6ea9638e1f3
commit: Update Golang versions to 1.12
Provider
name: openfaas-operator
orchestration: kubernetes
version: 0.14.1
sha: e747b6ace86bc54184d899fa10cf46dada331af1
Hi in order to look at this we will need to know your complete set-up and how we can reproduce the issue. I can't actually reproduce this so I would think you may have missed some key information?
/set title: Support request on scaling from zero
/add label: support
@alexellis Thanks for your reply.
Here is the repo of our project (Branch openfaas):
https://github.com/magda-io/magda-ckan-connector/tree/openfaas
I tried to create a customised docker image to suit our project.
You don't have to build the project as the docker image is available from docker hub.
We deploy it like (helm tpl is here):
apiVersion: openfaas.com/v1
kind: Function
metadata:
name: ckan-data-url-processor
namespace: "openfaas-fn"
labels:
magdaCategory: connectors
magdaSubCategory: ckan-connector
magdaType: data-url-processor
spec:
name: ckan-data-url-processor
handler: node bootstrap.js
image: "docker.io/data61/magda-ckan-connector:0.0.57-0-openfaas"
labels:
magdaCategory: connectors
magdaSubCategory: ckan-connector
magdaType: data-url-processor
com.openfaas.scale.zero: "true"
environment:
handler: remoteDataUrl
read_timeout: 65s
write_timeout: 65s
exec_timeout: 60s
userId: 00000000-0000-4000-8000-000000000000
If you do need to build the docker image locally for testing, you can run the following in the cloned repo:
yarn install
yarn build
# assume you use minikube --- setup docker env vars
eval $(minikube docker-env)
yarn docker-build-local
To invoke the function
POST the following:
{
"handler": "remoteDataUrl",
"remoteDataUrl": "https://demo.ckan.org/dataset/databaza-ziadosti-o-dotacie-mk-sr-za-rok-2019_test2020_02_18_14_50"
}
to http://localhost:8080/function/ckan-data-url-processor
Assume you have port-forwarded openfaas gateway to localhost:8080
The function simply pulls data from a public available API, convert it to JSON data and return it.
Please let me know if you need more information to reproduce the issue. Thanks again!
@alexellis
Not familiar with GO but quick browsing the code and looks like _/health
endpoint will return 200 immediately once of-watchdog
's HTTP server starts to listen?
See the line below. The acceptingConnections
has been set to 1
here.
After that, _/health
endpoint will always return 200 code.
At this stage, the executor HTTP server may not be ready yet (although buildRequestHandler
is called earlier).
Could we probably add some logic to forward HTTP request to executor HTTP server's _health
endpoint at here after acceptingConnections==1
&& lockFilePresent()
both true?
This will give us a chance to make sure executor HTTP server is online before serve any function invoke request.
How long does it take until your function server is ready?
Would you be open to trying out a custom health check path? https://docs.openfaas.com/reference/workloads/
It should be proxied to your function code.
4 to 6 seconds.
From the logs, I can tell the of-watchdog
's HTTP server is online and start to serve _health
request within 1 second.
ou be open to trying out a custom health check path
Thanks a lot! Will have a try the custom health check path
and report back ๐
I've added a /_/health
endpoint to my express executor:
app.get("/_/health", (req, res) => {
res.status(200).send("OK");
});
And delete the function and re-deploy the function with com.openfaas.health.http.path: "/_/health"
annotations:
apiVersion: openfaas.com/v1
kind: Function
metadata:
name: ckan-data-url-processor
namespace: openfaas-fn
labels:
magdaCategory: connectors
magdaSubCategory: ckan-connector
magdaType: data-url-processor
spec:
name: ckan-data-url-processor
handler: node bootstrap.js
image: "localhost:5000/data61/magda-ckan-connector:latest"
labels:
magdaCategory: connectors
magdaSubCategory: ckan-connector
magdaType: data-url-processor
com.openfaas.scale.zero: "true"
annotations:
com.openfaas.health.http.path: "/_/health"
environment:
handler: remoteDataUrl
read_timeout: 65s
write_timeout: 65s
exec_timeout: 60s
userId: 00000000-0000-4000-8000-000000000000
limits:
cpu: 100m
requests:
cpu: 50m
memory: 30Mi
However, seems still got the same error.
The invocation request was received after 5s but the express server was up after 10s:
Forking - node [bootstrap.js]
2020/03/12 12:49:51 Started logging stderr from function.
2020/03/12 12:49:51 Started logging stdout from function.
2020/03/12 12:49:51 OperationalMode: http
2020/03/12 12:49:51 Timeouts: read: 1m5s, write: 1m5s hard: 1m0s.
2020/03/12 12:49:51 Listening on port: 8080
2020/03/12 12:49:51 Writing lock-file to: /tmp/.lock
2020/03/12 12:49:51 Metrics listening on port: 8081
2020/03/12 12:49:55 Upstream HTTP request error: Post http://127.0.0.1:3000/: dial tcp 127.0.0.1:3000: connect: connection refused
2020/03/12 12:50:00 stdout: OpenFaaS Node.js listening on port: 3000
Seems custom health check path didn't work.
Is it something to do with how I setup the custom health check path?
You need to use a different path otherwise you'll conflict with the watchdog
Set com.openfaas.health.http.path to a different path works!
I do need to set com.openfaas.health.http.initialDelay: "10s" as well to avoid container being restarted.
Thanks a lot! @alexellis ๐
What was your typical start up time for your container?
Given your usage, you may benefit from joining the Insiders Program, from 5USD/mo. https://openfaas.com/support
I think the typical container start-up time is around 10s.
A closer look at it I found the long startup time may cause by the require()
in NodeJs.
We use require()
to load a lot of JSON files from another NPM package ---- it might be the culprit that causes the delay.
Regarding the support service, we just start to look at FaaS solutions and evaluate the suitability of Openfaas for our project.
Our project is an open-source project (i.e. we won't make any money from it and probably won't have the budget for the support service ).
But I think if we do adopt Openfaas as part of our solution, many of our users will be interested in the support service.
I'm also quite happy to contribute to the project via feedback/bugs report or even some coding (can do NodeJs & Scala and keen to pick up some GO)
Thanks again @alexellis for your help & well done on the Openfaas project ๐
Our project is an open-source project (i.e. we won't make any money from it and probably won't have the budget for the support service ).
Just so that you're aware, openfaas is also an open source project which has no external funding. I didn't link you to a support program, but a landing page which has the newsletter subscription - since you are using OpenFaaS (it seems heavily), the least you can do is check it out and read the sample. If you benefited from me giving you free technical support here (which you did), then you will benefit from the weekly email updates on the project and ecosystem.