Beginning of job logs is lost after update of project template
Closed this issue · 4 comments
After I updated my project Makefile
with cookiecutter-neuro-project v1.3
, the beginning part of the logs started disappearing. Here is my command to run a train job on the platform:
TRAIN_CMD="bash -c 'cd $(PROJECT_PATH_ENV) && python -u $(CODE_DIR)/train.py -c $(CODE_DIR)/configs/$(CONFIG_NAME)'"
.PHONY: train
train: _check_setup upload-code upload-config ### Run a training job (set up env var 'RUN' to specify the training job),
$(NEURO) run \
--name $(TRAIN_JOB)-$(RUN) \
--description "$(PROJECT_ID):train" \
--preset $(PRESET) \
--detach \
$(TRAIN_WAIT_START_OPTION) \
--volume $(DATA_DIR_STORAGE):$(PROJECT_PATH_ENV)/$(DATA_DIR):ro \
--volume $(PROJECT_PATH_STORAGE)/$(CODE_DIR):$(PROJECT_PATH_ENV)/$(CODE_DIR):ro \
--volume $(PROJECT_PATH_STORAGE)/$(CONFIG_DIR):$(PROJECT_PATH_ENV)/$(CONFIG_DIR):ro \
--volume $(RESULTS_DIR_STORAGE):$(PROJECT_PATH_ENV)/$(RESULTS_DIR):rw \
--env PYTHONPATH=$(PROJECT_PATH_ENV) \
--env EXPOSE_SSH=yes \
--env JOB_TIMEOUT=0 \
${OPTION_GCP_CREDENTIALS} ${OPTION_AWS_CREDENTIALS} ${OPTION_WANDB_CREDENTIALS} \
$(CUSTOM_ENV_NAME) \
$(TRAIN_CMD)
ifeq (${TRAIN_STREAM_LOGS}, yes)
@echo "Streaming logs of the job $(TRAIN_JOB)-$(RUN)"
$(NEURO) exec --no-key-check -T $(TRAIN_JOB)-$(RUN) "tail -f /output" || echo -e "Stopped streaming logs.\nUse 'neuro logs <job>' to see full logs."
endif
After a run of this command I typically get the following output:
where part of the console outputs of my script is missed. The supposed script output can be viewed in the attached log file:
15-01-2020_15-34-51.log
Before the update, there was not any problem like this.
I know why this happens. This is because of the tqdm
support. The process goes as follows:
- The job starts and immediately detaches, all the output goes to a dedicated pipe '/output'.
neuro exec
runs right after the job detach withtail -f /output
.
There is a tiny delay when no one reads the logs. I assume that your output is pretty intensive at the beginning, thus the very start of them is lost.
May I ask you to make an experiment: update the line
$(NEURO) exec --no-key-check -T $(TRAIN_JOB)-$(RUN) "tail -f /output" || echo -e "Stopped streaming logs.\nUse 'neuro logs <job>' to see full logs."
to
$(NEURO) exec --no-key-check -T $(TRAIN_JOB)-$(RUN) "tail -f -n 1000000 /output" || echo -e "Stopped streaming logs.\nUse 'neuro logs <job>' to see full logs."
and run your experiments again?
So, I can see my logs right now, but there is no sign of error output if it appears at script execution beginning (e.g. incorrect parser key, import error ...).
outdated, we are not writing job logs into the /output
file anymore