chop-dbhi/harvest

serrano migration failure when using mySQL database residing on another server

Closed this issue · 34 comments

Hi all,

Big picture: i'm setting up a frontend to a database i created that catalogues a series of experiments done to investigate the properties of the basilar membrane in gerbils. I wrote a front-end for it last year, but it's brittle, so I'm now investigating getting harvest to pick up the heavy lifting.

So far, i have harvest up and running on a x64 install of Ubuntu Server 12.04.4 LTS -- i can create a dummy project and it happily serves up a rather boring webpage. (ie, i can complete http://harvest.research.chop.edu/download/ ). I have been creating project-envs nested under /var/harvests/ for convenience. This one is named gawa.

I'm moving on to the instructions in http://harvest.research.chop.edu/articles/2013/11/20/using-your-data/ now, and now i've started to run into some questions.

Right now, the mysql database of experimental results (named GAWA) lives on a NAS box (at $NAS_IP ) in the lab. Consequently, I'm lead to believe that i should edit $envroot/gawa/gawa/conf/local_settings.py and add that database as the default DB, and then run python bin/manage.py syncdb --migrate.

My current local_settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'GAWA',
        'USER': 'harvest_gawa',
        'PASSWORD':'*****',
        'HOST': '$NAS_IP',
        'PORT': '3306'
    }
}

where harvest_gawa is a user i created just for this on the NAS.

However, doing so has run into two issues:

  1. My GAWA experimental results database now has many django tables in it, which seems suboptimal.
  2. attempting to migrate results in the error below:
(gawa-env)gvoysey@eng2-dhcp140:/var/harvest/gawa-env$ python gawa/bin/manage.py
syncdb --migrate
Syncing...
Creating tables ...
Creating table south_migrationhistory
Creating table django_admin_log
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_groups
Creating table auth_user_user_permissions
Creating table auth_user
Creating table django_content_type
Creating table django_session
Creating table django_site

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (leave blank to use 'gvoysey'):
Email address: *****
Password:
Password (again):
Superuser created successfully.
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)
Migrating...
Running migrations for serrano:
 - Migrating forwards to 0001_initial.
 > serrano:0001_initial
TransactionManagementError: Transaction managed block ended with pending COMMIT/ROLLBACK

Thoughts on where to go next? It seems, knowing not very much about django, that i ought to be using the sqlite DB in $PROJECT_PATH/gawa.db for django stuff, and configuring the remote GAWA payload database as a second one in local_settings.py ? Is that correct, or does django need to tie into the payload database?

I am going to look into why the migration is failing for Serrano, but in the mean time we can take advantage of Django's multi-database support (which is ideal anyway since you don't want to cruft up your existing database). We can use a database router to ensure queries are being executed on the correct database without having to tell Django to do so.

Update DATABASES

Rename the default entry in DATABASES to something different such as data and add a new entry called default that points to a SQLite database path (such as your gawa.db suggestion).

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(PROJECT_PATH, 'gawa_app.db'),
    },
    'data': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'GAWA',
        'USER': 'harvest_gawa',
        'PASSWORD':'*****',
        'HOST': '$NAS_IP',
        'PORT': '3306'
    }
}

Database Router

Create a file $envroot/gawa/gawa/conf/routers.py with the following contents.

class DataRouter(object):
    using = 'data'
    app_labels = ['gawa']

    def db_for_read(self, model, **hints):
        if model._meta.app_label in self.app_labels:
            return self.using

    def db_for_write(self, model, **hints):
        if model._meta.app_label in self.app_labels:
            return False

    def allow_syncdb(self, db, model):
        if model._meta.app_label in self.app_labels:
            return db == self.using
        return False

This router will only allow Django to perform reads from your experiments database (no writes, no alterations). I am making the assumption the models for your GAWA database are defined in $envroot/gawa/gawa/models.py? Otherwise, change the app_labels to app names your experiment data models live in.

Add the following setting to $envroot/gawa/gawa/conf/global_settings.py

DATABASE_ROUTERS = ['gawa.conf.routers.DataRouter']

Sync

Re-run the python bin/manage.py syncdb --migrate command and all the tables should be created under the SQLite database and your experiments database should not be touched.

Right, I've gotten most of this done. Thanks for your very detailed reply!

I have not generated $envroot/gawa/gawa/models.py . Is it fair to follow the advice given in https://docs.djangoproject.com/en/dev/howto/legacy-databases/#auto-generate-the-models and do python manage.py inspectdb > models.py ? If so, how do i specify which db i wish to inspect from the DATABASES list?

Yes that is the correct command, python manage.py inspectdb --database=data > models.py

( whoops, accidentally hit the wrong reply button; i didn't mean to close this issue)

@gvoysey we opened a ticket (chop-dbhi/avocado#180) for your MySQL issue, however the above setup I described should have alleviated this problem since Django should not be acting on the MySQL database at all due to the router. Migrations would only be applied to the SQLite database defined by the default database settings.

On a side note, MySQL tables using the MyISAM engine do not support transactions (which is what I presume what your database is using), so that is most likely why South is panicking about what to do.

Great information. Thank you for the detailed response and I agree with @gvoysey that the routing strategy is the best, so no alteration to our database.

I'm receiving this Warning: Can't read dir of './credentialing_test/'
when running this command python manage.py inspectdb --database=data > models.py

@bruth -- agreed, they were being created as MyISAM. (The existing tables are all InnoDB, so i guess myISAM is the django default).

@protactinium -- i'm getting django.core.exceptions.ImproperlyConfigured: Module "<module 'gawa.conf' from 'gawa/conf/__init__.pyc'>" does not define a database router name "routers" , which i'm debugging now -- certainly progress.

@gvoysey Sorry about that, I updated the comment above, but is not reflected in the email notification. The DATABASE_ROUTERS settings should be:

DATABASE_ROUTERS = ['gawa.conf.routers.DataRouter']

@protactinium Regarding the ./credentialing_test/ directory, I am going to suspect that is an issue on your end with the database or MySQL (such as a permission issue). Nothing in Harvest nor a search for credentialing_test with Django or MySQL mention that directory. Is that directory something related to your GAWA database? Does it depend on it in some way? If not, would your DBA know?

@bruth -- setting DATABASE_ROUTERS correctly now lets me generate a valid models.py. Also, @protactinium isn't associated with my project, so any GAWA-specific configurations are probably moot relative to him.

Oops sorry, I guess I presumed you worked together. Glad the router worked.

On Feb 25, 2014, at 5:36 PM, Graham Voysey notifications@github.com wrote:

@bruth -- setting DATABASE_ROUTERS correctly now lets me generate a valid models.py. Also, @protactinium isn't associated with my project, so any GAWA-specific configurations are probably moot relative to him.


Reply to this email directly or view it on GitHub.

no problem -- we're both in boston, but that's it.

I'm now getting DatabaseError: no such table: south_migrationhistory after running manage.py syncdb --migrate, though -- having now generated a models.py and models.pyc that correctly describe the remote MySQL GAWA (but not the local sqlite3 db).

Try the syncdb command first followed by the syncdb --migrate.

On Feb 25, 2014, at 5:57 PM, Graham Voysey notifications@github.com wrote:

no problem -- we're both in boston, but that's it.

I'm now getting DatabaseError: no such table: south_migrationhistory after running manage.py syncdb --migrate, though -- having now generated a models.py and models.pyc that correctly describe the remote MySQL GAWA (but not the local sqlite3 db).


Reply to this email directly or view it on GitHub.

@bruth -- I get the following.

(gawa-env)gvoysey@eng2-dhcp140:/var/harvest/gawa-env/gawa$ sudo ../bin/python bin/manage.py syncdb
Syncing...
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

Synced:
 > gawa
 > south
 > modeltree
 > django.contrib.admin
 > django.contrib.auth
 > django.contrib.contenttypes
 > django.contrib.humanize
 > django.contrib.markup
 > django.contrib.messages
 > django.contrib.sessions
 > django.contrib.sites
 > django.contrib.staticfiles

Not synced (use migrations):
 - serrano
 - avocado
(use ./manage.py migrate to migrate these)
(gawa-env)gvoysey@eng2-dhcp140:/var/harvest/gawa-env/gawa$ sudo ../bin/python bin/manage.py migrate
DatabaseError: no such table: south_migrationhistory

@gvoysey Can you confirm that the sqlite database was created (for the default database connection) and south_migrationhistory table was created in there? You can do sqlite3 gawa_app.db (assuming in the main project directory) followed by .schema to print out all the tables that have been created.

@bruth That's almost certainly what's in error now. I just looked; gawa_app.db is a 0-byte file. There's an existing gawa.db which does have a bunch of tables in it - but the creation date is 21 Feb, whereas gawa_app.db was created on 25 Feb.

I am pretty sure that there's a missing line in a config file somewhere now, but i'm not sure where to look.

I'm kind of tempted to wipe out this project and try to do it over cleanly, applying all the changes to local_settings.py etc. as we've been working on before the first source ../bin/activate. If it's more useful for harvest that we keep debugging this, that's certainly okay by me as well.

@gvoysey We should be able to recover this setup. The DATABASES setting should be in local_settings.py, the DATABASE_ROUTERS should be in global_settings.py. If you open the shell: python bin/manage.py shell

from django.db import connections
print(connections['default'].settings_dict)

You should see NAME as the correct path to gawa_app.db

@bruth

>>> print(connections['default'].settings_dict)
{'ENGINE': 'django.db.backends.sqlite3', 'TEST_MIRROR': None, 'NAME': '/var/harvest/gawa-env/gawa/gawa/conf/../../gawa_app.db', 'TEST_CHARSET': None, 'TIME_ZONE': None, 'TEST_COLLATION': None, 'PORT': '', 'HOST': '', 'USER': '', 'TEST_NAME': None, 'PASSWORD': '', 'OPTIONS': {}}

I have no idea why the path has the extra ../ bits, but it does ultimately resolve to '/var/harvest/gawa-env/gawa/gawa_app.db' , which is correct, but points to that 0-byte file.

Sometimes symlinked directories can screw up the final destination of the real file. Try replacing the NAME with the absolute path to the database file for the time being and try the syncdb and migrate.

okay. How do i do that in a way that will stick? I set the value running the python shell, but it didn't persist.

It should be set in global_settings.py file.

Hmm. Interesting. A snippet of global_settings.py :

# Import the project module to calculate directories relative to the module
# location.
PROJECT_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../..')

this is probably where that ../../ comes from -- but that's auto-generated code.

A little later, DATABASES is defined in global_settings.py too -- i don't know if this takes precedence over local_settings.py, but changing it to gawa_app.db doesn't seem to make a difference.

#
# DATABASES
# Each database can be specified here, but passwords should be in a separate
# file that is not versioned. Use ``local_settings.py``.
#

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(PROJECT_PATH, 'gawa.db')
    }
}

I'm sorry the absolute path of the database should be set in the local settings not global. Local takes precedent over the global settings.

On Feb 26, 2014, at 2:38 PM, Graham Voysey notifications@github.com wrote:

Hmm. Interesting. A snippet of global_settings.py :

Import the project module to calculate directories relative to the module

location.

PROJECT_PATH = os.path.join(os.path.dirname(os.path.realpath(file)), '../..')

this is probably where that ../../ comes from -- but that's auto-generated code.

A little later, DATABASES is defined in global_settings.py too -- i don't know if this takes precedence over local_settings.py, but changing it to gawa_app.db doesn't seem to make a difference.

DATABASES

Each database can be specified here, but passwords should be in a separate

file that is not versioned. Use local_settings.py.

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(PROJECT_PATH, 'gawa.db')
}
}

Reply to this email directly or view it on GitHub.

ah, that makes more sense. I'll replace os.path.join(PROJECT_PATH, 'gawa_app.db') with the literal path and report back.

No dice. I moved gawa_app.db to gawa_app.db.bak to see if running manage.py syncdb --migrate would re-create it . It does, but it's still 0 bytes.

local_settings.py currently defines the databases as follows:

DATABASES = {
     'default': {
         'ENGINE': 'django.db.backends.sqlite3',
#         'NAME': os.path.join(PROJECT_PATH, 'gawa_app.db')
         'NAME': '/var/harvest/gawa-env/gawa/gawa_app.db'
     },
    'data': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'gawa',
        'USER': '***',
        'PASSWORD':'****',
        'HOST': '****',
        'PORT': '3306'
    }
}

@gvoysey Not sure if you meant to post the data DB settings but you might want to remove them or use ****** for the login credentials. I didn't try to login but I can ping corti.bu.edu so I'm worried that your DB might be vulnerable if there are not firewall rules in place since the login info is now on this public github page. Either way, I would anonymize or just remove the USER and PASSWORD fields so your credentials aren't publicized.

well, that was embarrassing. Thanks for the catch, @naegelyd .

I am, at least, clever enough to have long ago restricted network access to that now-unnamed server to IPs originating inside my institution only, but still.

@gvoysey I am a bit perplexed by what is happening. To rule out that it's not something in your environment, can you create a plain Django project. In your activated environment, you can do:

django-admin.py startproject testproject

Update the DATABASES setting in the testproject/testproject/settings.py file to:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'app.db',
    }
}

Then run: python manage.py syncdb (within the first testproject directory).

@bruth : as you suspected, it must have been some weird environment thing. I'm not particularly keen on tracking down which typo i have in which configuration file .. but i recreated a new project from scratch, and ta-da! http://128.197.51.140:8000 should be serving up a very unconfigured harvest installation right now.

This particular issue being (as far as i can tell) solved, i'm going to close. Thank you for all your help! I'll certainly be checking in soon, i'm sure (just not on this thread)

I am glad you figured it out. I took a look at your instance and I noticed the distribution were failing to render. They are failing to find the gerbilDB.Model root model, i.e. a model named Model in the app named gerbilDB. I realize you may be in the the middle of changing things, but I figured I would mention it.

yeah, i noticed that as well. I've uncommented the relevant region in global_settings.py and i'm reading up on http://modeltree.harvest.io/ref/settings.html to figure out how best to generate or create a sane model.

The root model should be whichever model is the primary "focal point" of your questions. For example, we deal a lot with patient data, but sometimes have "sample" or "specimen" data, or even "variant" for genomic data. I noticed you have a "SubjectId" field on the query page, whichever model that field exists on is likely to be the root model.

@bruth for us it's pretty linear. Animal -> Experiment -> (many) Images -> (many, many) ROIs.