Will this template force users to use PostGres? Or can it be designed so that users can use a sentence table tsv dump?

Question

Will this template force users to use PostGres? Or can it be designed so that users can use a sentence table tsv dump?

Closed this issue 8 years ago · 4 comments

Answer 1 · 2016-03-02T15:48:45.000Z

The cobalt guys don't do any postgres stuff with the TSV dumps we've been sending them. They just read/parse the TSV directly. So no, we won't force users to use postgres. But apps that DO use postgres should assume that there's a table of data called [appname]_sentences magically there and waiting for them. For these apps, there should be a setup.py that reads the sample TSV files into the [appname]_sentences table, which run.py will read.

(edit) Obviously this is far from set in stone. The reason I'm thinking of it this way is that my gut feeling is that it'd be wasteful to do a (SELECT subset FROM postgres master table) -> (tsv dump) -> (postgres ingest), since the dumping+reading could be pretty slow and looks unnecessary to me.

Answer 2 · 2016-03-02T16:01:39.000Z

Ah - I think I see now. So will all apps also have a directory called ./input that has the same data products (but at the full scale, in cases where the test_set < full_set) as you have been sending folks? In other words, would this code work “ with open('./input/bibjson') as fid”

On Mar 2, 2016, at 9:48 AM, Ian Ross <notifications@github.com mailto:notifications@github.com> wrote:

The cobalt guys don't do any postgres stuff with the TSV dumps we've been sending them. They just read/parse the TSV directly. So no, we won't force users to use postgres. But apps that DO use postgres should assume that there's a table of data called [appname]_sentences magically there and waiting for them. For these apps, there should be a setup.py that reads the sample TSV files into the [appname]_sentences table, which run.py will read.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-191296127.

Answer 3 · 2016-03-02T16:47:27.000Z

Yup, that'll work. The bibjson will always be in ./input. And we can certainly dump all the data products there too, if that's what the user (you) want.

Answer 4 · 2016-03-02T17:36:10.000Z

yes, I think that makes a lot of sense, if its not too much trouble. In other words, the environment that we (i.e. you) will create for the full app deployment will be as similar to the dev environment as possible.

On Mar 2, 2016, at 10:47 AM, Ian Ross <notifications@github.com mailto:notifications@github.com> wrote:

Yup, that'll work. The bibjson will always be in ./input. And we can certainly dump all the data products there too, if that's what the user (you) want.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-191321095.