Installation Issues
tfurmston opened this issue · 4 comments
Hi,
I am trying to do a local installation of the project so that I can play around with it, but am having some issues with the installation.
There is a fair amount going on with the install, so I have decided to do it through docker. I couldn't find the image referenced in the documentation, so am writing my own. Here it is thus far:
FROM python:3.7-slim-stretch
ENV PROJECT_LOCATION /srv/reagent
RUN mkdir -p $PROJECT_LOCATION
WORKDIR $PROJECT_LOCATION
RUN apt-get update -qq \
&& apt-get install --no-install-recommends -y \
build-essential \
openssh-client \
git \
software-properties-common \
libblas-dev \
libffi-dev \
liblapack-dev \
libopenblas-base \
libsasl2-dev \
libssl-dev \
libsasl2-modules \
python3-dev \
libpq-dev \
ffmpeg \
libsm6 \
libxext6 \
curl \
unzip \
zip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/facebookresearch/ReAgent.git $PROJECT_LOCATION
RUN python -m pip install ".[gym]"
RUN python -m pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
RUN curl -s "https://get.sdkman.io" | bash
SHELL ["/bin/bash", "-c", "source $HOME/.sdkman/bin/sdkman-init.sh"]
RUN sdk version
RUN sdk install java 8.0.272.hs-adpt
RUN sdk install scala
RUN sdk install maven
RUN sdk install spark 2.4.6
RUN apt-get update
RUN apt-get install bc
This all goes through fine and builds successfully. (I got some of the configuration from the CI as the documentation seemed a bit out of date.)
However, when I try to run through the offline RL training (batch) introduction, here, I run into some issues.
In particular, when I get to the line:
./reagent/workflow/cli.py run reagent.workflow.gym_batch_rl.timeline_operator $CONFIG
I get the following error:
Building with config:
{'spark.app.name': 'ReAgent',
'spark.driver.extraClassPath': '/usr/local/lib/python3.7/site-packages/reagent/../preprocessing/target/rl-preprocessing-1.1.jar',
'spark.driver.host': '127.0.0.1',
'spark.master': 'local[*]',
'spark.sql.catalogImplementation': 'hive',
'spark.sql.execution.arrow.enabled': 'true',
'spark.sql.session.timeZone': 'UTC',
'spark.sql.shuffle.partitions': '12',
'spark.sql.warehouse.dir': '/srv/reagent/spark-warehouse'}
JAVA_HOME is not set
Traceback (most recent call last):
File "./reagent/workflow/cli.py", line 89, in <module>
reagent()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "./reagent/workflow/cli.py", line 77, in run
func(**config.asdict())
File "/usr/local/lib/python3.7/site-packages/reagent/workflow/gym_batch_rl.py", line 75, in timeline_operator
spark = get_spark_session()
File "/usr/local/lib/python3.7/site-packages/reagent/workflow/spark_utils.py", line 62, in get_spark_session
spark = spark.getOrCreate()
File "/usr/local/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
> /usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py(108)_launch_gateway()
-> raise Exception("Java gateway process exited before sending its port number")
Am I missing something in my install? Any help would be much appreciated.
Also, generally, I think it would be helpful if you provided a docker script for people. Happy to add one once I am finished fixing mine, if it helps.
I am having the some problem. Also the CI seems to be failing on the master branch.
Is there any update on adding a dockerfile? I agree it would be very helpful also to keep things up to date when it is included in the CI.
I was also looking for a docker install option but couldn't find the image anywhere in the documentation.
We don't use docker anymore since the installation is all done with pip. But you can use a stock Ubuntu image and pip install it in there
Sorry, maybe I am missing something, but don't we also have non-python dependencies. For example, I thought part of the project uses Spark.
From the error message above it was my impression that the error was coming from the spark pipeline to pre-process the data. Did I misunderstand something?