BurntSushi/nfldb

How to offset query?

NgoKnows opened this issue · 4 comments

Is it possible to offset the query using the API, I see there is an method for limiting, but not for offset?

Nothing native in the API that I'm aware of, but I believe many of the items that get generated are index-able...

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2017, season_type='Regular')
for rank, pp in enumerate(q.sort('passing_yds').limit(5).as_aggregate(), 1):
    print rank, pp.player, pp.passing_yds
print '-'*25
for rank, pp in enumerate(q.sort('passing_yds').limit(8).as_aggregate()[3:], 4):
    print rank, pp.player, pp.passing_yds
1 Tom Brady (NE, QB) 714
2 Aaron Rodgers (GB, QB) 654
3 Drew Brees (NO, QB) 647
4 Carson Wentz (PHI, QB) 640
5 Alex Smith (KC, QB) 619
-------------------------
4 Carson Wentz (PHI, QB) 640
5 Alex Smith (KC, QB) 619
6 Carson Palmer (ARI, QB) 601
7 Matt Ryan (ATL, QB) 573
8 Jared Goff (LA, QB) 530

Would this be less performant though, let's say I wanted to offset by 100 for example. Would I then have to do .limit(100 + numberOfRowsIWant) and then load all of those rows into memory rather than doing it at the db level?

¯_(ツ)_/¯...probably?

If my approximately 10 seconds of testing I didn't see a huge difference.

import nfldb
import time

db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2017, season_type='Regular')

start = time.time()
for rank, pp in enumerate(q.sort('passing_yds').limit(5).as_aggregate(), 1):
    print rank, pp.player, pp.passing_yds
print time.time() - start

start = time.time()
for rank, pp in enumerate(q.sort('passing_yds').limit(508).as_aggregate()[503:], 504):
    print rank, pp.player, pp.passing_yds
print time.time() - start
1 Tom Brady (NE, QB) 714
2 Aaron Rodgers (GB, QB) 654
3 Drew Brees (NO, QB) 647
4 Carson Wentz (PHI, QB) 640
5 Alex Smith (KC, QB) 619
0.909999847412
504 Jeff Locke (DET, P) 0
505 Logan Ryan (TEN, CB) 0
506 David Amerson (OAK, CB) 0
507 Terrell McClain (WAS, DT) 0
508 Randall Cobb (GB, WR) 0
1.05400013924

But yeah, if an offset is something you want, it's probably more efficient to do it on the SQL side of things. As limit appears to be passed through in many places to ultimately find it's way into that SQL expression, it may be an easy addition to the library since the SQL syntax changes very little to add an offset to the query

Out of curiosity, what's your scenario for desiring to offset by 100 (or any x)? Something like displaying the 5th page of results where you display 20 per page or something else entirely? Just being nosy...

Hmm yeah performance doesn't seem like it would be too bad at all, especially if you're not getting a ridiculous number of rows. Yeah it doesn't look like it would be too hard to add offset to the library, I might give it a shot.

What I'm thinking about doing is showing a graph of some players statistics, and then progressively loading more players as the user reaches the end of the graph, rather than fetch a huge list of stats right away.

I appreciate your responsiveness :)