mpope9/nba-sql

Duplicate key name when populating MySQL DB

pb-new-username opened this issue · 10 comments

I am currently running this on a macOS with python 3.9.10. Here are the steps I run in.

  1. docker-compose -f docker/docker-compose-mysql.yml up -d
  2. DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --create-schema

I am getting this when running this command on a brand new docker container.

Loading seasons: ['2022-23'].
Connecting to mysql database.
Loading the database in the default mode.
Initializing schema.
Traceback (most recent call last):
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
    cursor.execute(sql, params or ())
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
    result.read()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
    first_packet = self.connection._read_packet()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
    packet.raise_for_error()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1061, "Duplicate key name 'playergamelogtemp_player_id'")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 336, in <module>
    main(args)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 291, in main
    default_mode(settings, create_schema, request_gap, seasons, skip_tables)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 74, in default_mode
    do_create_schema(object_list)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 204, in do_create_schema
    obj.create_ddl()
  File "/Users/atiqa/github-non-amd/nba-sql/stats/player_game_log.py", line 40, in create_ddl
    self.settings.db.create_tables([PlayerGameLogTemp], safe=True)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3348, in create_tables
    model.create_table(**options)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 6721, in create_table
    cls._schema.create_all(safe, **options)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5829, in create_all
    self.create_indexes(safe=safe)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5743, in create_indexes
    self.database.execute(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3173, in execute
    return self.execute_sql(sql, params, commit=commit)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3167, in execute_sql
    self.commit()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 2933, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 191, in reraise
    raise value.with_traceback(tb)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
    cursor.execute(sql, params or ())
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
    result.read()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
    first_packet = self.connection._read_packet()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
    packet.raise_for_error()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
peewee.OperationalError: (1061, "Duplicate key name 'playergamelogtemp_player_id'")

The error doesn't make much sense to be because the table 'player_game_log_temp' has a composite primary key that consists of player_id and game_id.

This is odd. I'm curious are you creating a fresh database or trying to update an existing one? The current support for updating existing databases doesn't extend to all tables just quite yet.

Hmm but it is a temp table. I haven't tried this season quite yet, I wonder if the NBA data has a flaw or changed their format.

I'll take a look this evening.

I am creating a fresh one.

In the file 'player_game_log.py' there's this comment on line 30:

# TODO: this conflicts with a fresh db.

Perhaps this is a known issue then?

Possibly, it's been a few months since I've tried to dive into the code. I do remember that I try to avoid populating player_game_log, if I can

I think the code needs to do an insert_many_on_conflict_ignore. However at the time I didn't fully understand if that would silently lose data for some reason. That is why I left the TODO.

Ahh, i see. I commented out player_game_log_requester from object_listfrom the function default_mode. I then got the error Duplicate key name 'shotchartdetailtemp_game_id'" below. I think I may have stumbled upon a recipe to get initial tables populated while avoiding the duplicate key errors.

Traceback (most recent call last):
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
    cursor.execute(sql, params or ())
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
    result.read()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
    first_packet = self.connection._read_packet()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
    packet.raise_for_error()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1061, "Duplicate key name 'shotchartdetailtemp_game_id'")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 336, in <module>
    main(args)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 291, in main
    default_mode(settings, create_schema, request_gap, seasons, skip_tables)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 74, in default_mode
    do_create_schema(object_list)
  File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 204, in do_create_schema
    obj.create_ddl()
  File "/Users/atiqa/github-non-amd/nba-sql/stats/shot_chart_detail.py", line 28, in create_ddl
    self.settings.db.create_tables([ShotChartDetailTemp], safe=True)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3348, in create_tables
    model.create_table(**options)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 6721, in create_table
    cls._schema.create_all(safe, **options)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5829, in create_all
    self.create_indexes(safe=safe)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5743, in create_indexes
    self.database.execute(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3173, in execute
    return self.execute_sql(sql, params, commit=commit)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3167, in execute_sql
    self.commit()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 2933, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 191, in reraise
    raise value.with_traceback(tb)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
    cursor.execute(sql, params or ())
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
    result.read()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
    first_packet = self.connection._read_packet()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
    packet.raise_for_error()
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
peewee.OperationalError: (1061, "Duplicate key name 'shotchartdetailtemp_game_id'")

Here's a recipe you can follow if you want to populate a fresh DB with the table player_game_log populated.

  1. Run the command: DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --create-schema
  2. You’ll hit an error that says Duplicate key name 'playergamelogtemp_player_id'. Comment out the variable player_game_log_requester from variable object_list inside function default_mode in the file stats/nba_sql.py
  3. Run the command from step 1 again
  4. You’ll hit an error saying Duplicate key name 'playergamelogtemp_player_id'. Comment out the variable shot_chart_requester from variable object_list inside function default_mode in the file stats/nba_sql.py
  5. Comment out the variable shot_chart_requester from variable object_list inside function default_mode in the file stats/nba_sql.py
  6. Run the command: DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --seasons 2021-22 --skip-tables play_by_play shot_chart_detail

I simply skipped play_by_play and shot_chart_detail as I'm not interested in that information.

Thank you for the repro steps. Have you tried the --skip-tables option? Sounds like it might fit your use-case if you don't want to juggle comments.

It seems that we encounter the errors when it's creating the tables player_game_log and shot_chart_detail in a fresh DB. So I think if you don't want the two tables populated, you can simply use the --skip-tables command line argument. But if you do want the two tables, then I think you have to use the comment/uncomment method here. But the good news is that you only have to do it once as this error only occurs when it's creating the tables player_game_log and shot_chart_detail.