alan-turing-institute/AIrsenal

sqlite3 error database disk image is malformed (conda on Ubuntu)

fol-debug opened this issue · 21 comments

Hi,

I'm getting a lot of errors on the database itself while running run_airsenal_predictions. Right now it says "Filling history dataframe for xxx: 218/218" so it seems that it maybe isn't getting all of the players. Also, running the optimization, it shows errors because of this when running strategies. So right now it only runs on one thread; dropped out everything else.

The prediction-table seems correctly filled out to me so I'm not really sure where to start looking for the faults.

One of the errors also says "file is not a database".

EDIT: I see now that the fixture-table is empty on player_id; maybe this is where the error stems from?

EDIT: Running a PRAGMA integrity_check turns out OK

EDIT: This is on the fix/unique_violation_fix branch.

With the fix/unique_violation_fix branch installed, and ensuring you have FPL_TEAM_ID set with your team ID either as an environment variable or in the file airsenal/data/FPL_TEAM_ID, could you run each of the following steps in order and copy any errors you get into a comment here?

> rm /tmp/data.db
> setup_airsenal_database
> update_airsenal_database
> run_airsenal_predictions --weeks_ahead 3
> run_airsenal_optimization --weeks_ahead 3

Heres the beginning of run_airsenal_predictions --weeks_ahead 3

https://pastebin.com/5d5TS7sW

run_airsenal_optimization --weeks_ahead 3
https://pastebin.com/p1Gyxy4U

Ok thanks I'll have a look over at some point. Just to confirm - the first 3 steps rm /tmp/data.db; setup_airsenal_database; update_airsenal_database don't give you any errors?

No, those are all good. I did however get an error while setting up the enviroment, building the bpl.wheel, but it continued and showed no further errors.

Heres the fully run optimization:
https://pastebin.com/m4ReUdFQ

No, those are all good. I did however get an error while setting up the enviroment, building the bpl.wheel, but it continued and showed no further errors.

This might be the problem. There are differences between how bpl is installed on master and the fix/unique_violation_fix branch at the moment, let's see whether you still have problems once the two have been merged (hopefully later this week).

Hmm. Could I maybe set up the database with the fix, and then use the same database for the master, meanwhile?

If you're keen to get something working you can try all this (install from master but run off fix branch):

Clean up previous environment:

> rm /tmp/data.db
> rm -r <PATH_TO_AIRSENAL_DIRECTORY>
> conda deactivate
> conda env remove -n airsenalenv 

Install AIrsenal to new environment (note -e argument to pip install):

> conda create -n airsenalenv python=3.7
> conda activate airsenalenv
> conda install -c psi4 gcc-5
> git clone https://github.com/alan-turing-institute/AIrsenal.git
> cd AIrsenal
> pip install -e .

If that doesn't give you build errors due to bpl then run:

> git checkout remotes/origin/fix/unique_playerid_violation
> setup_airsenal_database
> update_airsenal_database
> run_airsenal_predictions --weeks_ahead 3
> run_airsenal_optimization --weeks_ahead 3

Still gives me an error when 'run_airsenal_predictions --weeks_ahead 3'
http://pastebin.com/XvT2ikKA

EDIT: Thanks a lot for helping out, by the way. I really appreciate it.

Did bpl install successfully with no build errors with those instructions?

Yes.

I'm starting to think maybe my setup is wrong.

I've tried both Ubuntu and Debian on both VirtualBox and Hyper-V. Do you have any other suggestions on what I can try?

I've also had zero success on Windows; can't seem to find a compatible compiler.

It's mostly been developed on Macs and we have already seen some issues on Linux relating to the pystan package, e.g. #66

Ahhh. I see. That would explain the issues I have. I'll see if I can find a workaround :)

It worked immediately when trying to install it on a MacBook. Thanks!

Reopening this issue as I had the same problem with an Ubuntu 18.04 Azure VM. Also tried in an Ubuntu docker container on my Mac, which ran fine. Not sure what causes it, maybe related to trying to access the database from multiple processes.

setup_airsenal_database and update_run_airsenal_database run fine (and give a database that seems correct), but run_airsenal_predictions gives the error.

On the same Azure VM doing the following led to a working installation:

  • Install gcc, sqlite3 and pip3 with apt.
  • Install airsenal to the system python - not in a conda environment.

So might be a conda issue, possibly something r.e. conda and sqlite and/or gcc.

On the same Azure VM doing the following led to a working installation:

  • Install gcc, sqlite3 and pip3 with apt.
  • Install airsenal to the system python - not in a conda environment.

So might be a conda issue, possibly something r.e. conda and sqlite and/or gcc.

These steps worked to fix this on Ubuntu18.04 in windows subsytem for Linux too

Thanks for letting us know @Tdarnell - we should try to track down what's causing this.

Bit of searching around suggests this could be caused by a mismatch between sqlite3 versions, e.g. between the system version and the version used by python (in the conda env). Or related to managing connections across threads with multiprocessing/sqlalchemy/sqlite3 - that may make more sense as the (single thread) database setup and update work, but the (multithreaded) prediction fails.