PetrGlad/python-prevayler

Don't break on 10 billion'th transaction

Closed this issue · 7 comments

Code assumes log files and snapshot files sort in alpha order. Format char is %010u. So on 10 billion'th transactions, library will break.

There is no need.

If you have a very active website (30mm impressions/week) you can hit this in a year.

        def mostrecent_snapshot_id(self, allfiles):
                '''
                >>> f = lambda i: "%d.%s" % (i, Log.SNAPSHOT_SUFFIX)
                >>> o = Log("datadir")

                >>> o.mostrecent_snapshot_id(map(f, (1, 10, 100, 2, 5)))
                100

                >>> id = o.mostrecent_snapshot_id(map(f, ()))
                >>> id is None or id
                True

                >>> fns = map(f, (2,))
                >>> fns
                ['2.snapshot']
                >>> o.mostrecent_snapshot_id(fns)
                2
                '''

                maxid = None
                for fn in allfiles:
                        m = self.reSplitFileName.match(fn)
                        if m and m.group(2) == self.SNAPSHOT_SUFFIX:
                                maxid = max(maxid, int(m.group(1)))
                return maxid

        def findPieces(self):
                allFiles = os.listdir(self.dataDir)
                snapshot_id = self.mostrecent_snapshot_id(allFiles)
                loglist = []
                serialid = 0

                a = []
                for fn in allFiles:
                        m = self.reSplitFileName.match(fn)
                        if m:
                                thisId = int(m.group(1))
                                a.append((thisId, fn))
                a.sort()
                log_list = []
                for log_id, log_fn in a:
                        if snapshot_id and log_id <= snapshot_id:
                                continue
                        loglist.append(os.path.join(self.dataDir, log_fn))
                        serialid = log_id

                self.logFileName = self.makeLogFileName(serialid + 1) 
                self.logFile = open(self.logFileName, 'ab')
                return (serialid, snapshot_id, loglist)

I think it breaks b/c when figuring out which log's to reply, you assume file names sort in alpha order.

If transaction # > 10,000,000 i think that assumption no longer holds.

I like sorting files by alpha order too. So maybe just extend to more digits.

Code snipped I posted tried to get right log file order independently of how you format the file names.

Damn, git hub ui sucks. Second time I have clicked "comment and close" when I meant "comment."

I like 0 padding since this helps sorting files in directory in right order. It is not required of course.

Hurry, hurry :)

With a quick glance, code looks good to me.

And I agree, I alpha sort in directory is nice.

If you keep committing changes, I'm going to have to fork so I can sync up what I'm actually running with your stuff. :)

Closing.

Wow, did not expected that anyone would use it.
If that is not a secret, where do you use it?

No secret: www.crosscutmedia.com is the company. I've been using App Engine but want to use this for the next release.

So far, I've been very happy with how fast this runs and how well the replay log works. The prevayler idea is a great architecture.