Bad Performance using Python
raeudigerRaeffi opened this issue · 1 comments
Hi we are using a self hosted version of piston and we encountered some major limitations with regards to runtime for Python. Given that Piston advertise itself as efficent and fast I assume the issue is with us and not with the software.
Our setup is the following:
We use the piston docker image with the cli to install python. Then we run sudo /piston/packages/python/3.12.0//bin/pip3 install statsmodels plotly plotly-express scikit-learn
in order to install custom libaries.
The following enviroment variables are set:
- PISTON_RUN_TIMEOUT=80000
- PISTON_STDERR_LENGTH=800000
- PISTON_MAX_PROCESS_COUNT=124
- PISTON_MAX_FILE_SIZE=100000
- PISTON_OUTPUT_MAX_SIZE=250000
Using this setup the code displayed below takes around 20 secs to execute for 50 data point in os.environ["data"] (On my machine it takes less than a second).
import os
import json
import pandas as pd
import plotly
import numpy as np
import plotly.express as px
data = json.loads(os.environ["data"])
df = pd.DataFrame(data)
df['order_date'] = pd.to_datetime(df['order_date'], format='%d/%m/%Y %H:%M')
fig = px.scatter(df, x='order_date', y='sales', trendline='ols')
graph_json = plotly.io.to_json(fig)\nprint({\"type\":\"plot\",\"variable\":graph_json})
Are you running Piston on the same system as your local test? This could be one factor for the slow performance. This shouldn't have too large of an impact though.
I'm thinking this might be to do with python not caching pyc
files for these libraries.
This is by design to ensure complete isolation of code with no persistent files across runs.
I would try seeing which lines of code are causing the performance bottleneck. My bets would be on one of the import
s