Duplicate key name when populating MySQL DB
pb-new-username opened this issue · 10 comments
I am currently running this on a macOS with python 3.9.10. Here are the steps I run in.
docker-compose -f docker/docker-compose-mysql.yml up -d
DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --create-schema
I am getting this when running this command on a brand new docker container.
Loading seasons: ['2022-23'].
Connecting to mysql database.
Loading the database in the default mode.
Initializing schema.
Traceback (most recent call last):
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
cursor.execute(sql, params or ())
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
result = self._query(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
conn.query(q)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
result.read()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
first_packet = self.connection._read_packet()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
packet.raise_for_error()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (1061, "Duplicate key name 'playergamelogtemp_player_id'")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 336, in <module>
main(args)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 291, in main
default_mode(settings, create_schema, request_gap, seasons, skip_tables)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 74, in default_mode
do_create_schema(object_list)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 204, in do_create_schema
obj.create_ddl()
File "/Users/atiqa/github-non-amd/nba-sql/stats/player_game_log.py", line 40, in create_ddl
self.settings.db.create_tables([PlayerGameLogTemp], safe=True)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3348, in create_tables
model.create_table(**options)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 6721, in create_table
cls._schema.create_all(safe, **options)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5829, in create_all
self.create_indexes(safe=safe)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5743, in create_indexes
self.database.execute(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3173, in execute
return self.execute_sql(sql, params, commit=commit)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3167, in execute_sql
self.commit()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 2933, in __exit__
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 191, in reraise
raise value.with_traceback(tb)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
cursor.execute(sql, params or ())
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
result = self._query(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
conn.query(q)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
result.read()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
first_packet = self.connection._read_packet()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
packet.raise_for_error()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
peewee.OperationalError: (1061, "Duplicate key name 'playergamelogtemp_player_id'")
The error doesn't make much sense to be because the table 'player_game_log_temp' has a composite primary key that consists of player_id and game_id.
This is odd. I'm curious are you creating a fresh database or trying to update an existing one? The current support for updating existing databases doesn't extend to all tables just quite yet.
Hmm but it is a temp table. I haven't tried this season quite yet, I wonder if the NBA data has a flaw or changed their format.
I'll take a look this evening.
I am creating a fresh one.
In the file 'player_game_log.py' there's this comment on line 30:
# TODO: this conflicts with a fresh db.
Perhaps this is a known issue then?
Possibly, it's been a few months since I've tried to dive into the code. I do remember that I try to avoid populating player_game_log, if I can
I think the code needs to do an insert_many_on_conflict_ignore
. However at the time I didn't fully understand if that would silently lose data for some reason. That is why I left the TODO.
Ahh, i see. I commented out player_game_log_requester
from object_list
from the function default_mode
. I then got the error Duplicate key name 'shotchartdetailtemp_game_id'"
below. I think I may have stumbled upon a recipe to get initial tables populated while avoiding the duplicate key errors.
Traceback (most recent call last):
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
cursor.execute(sql, params or ())
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
result = self._query(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
conn.query(q)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
result.read()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
first_packet = self.connection._read_packet()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
packet.raise_for_error()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (1061, "Duplicate key name 'shotchartdetailtemp_game_id'")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 336, in <module>
main(args)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 291, in main
default_mode(settings, create_schema, request_gap, seasons, skip_tables)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 74, in default_mode
do_create_schema(object_list)
File "/Users/atiqa/github-non-amd/nba-sql/stats/nba_sql.py", line 204, in do_create_schema
obj.create_ddl()
File "/Users/atiqa/github-non-amd/nba-sql/stats/shot_chart_detail.py", line 28, in create_ddl
self.settings.db.create_tables([ShotChartDetailTemp], safe=True)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3348, in create_tables
model.create_table(**options)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 6721, in create_table
cls._schema.create_all(safe, **options)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5829, in create_all
self.create_indexes(safe=safe)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 5743, in create_indexes
self.database.execute(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3173, in execute
return self.execute_sql(sql, params, commit=commit)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3167, in execute_sql
self.commit()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 2933, in __exit__
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 191, in reraise
raise value.with_traceback(tb)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/peewee.py", line 3160, in execute_sql
cursor.execute(sql, params or ())
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in execute
result = self._query(query)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in _query
conn.query(q)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 544, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 771, in _read_query_result
result.read()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 1152, in read
first_packet = self.connection._read_packet()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/connections.py", line 721, in _read_packet
packet.raise_for_error()
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/Users/atiqa/github-non-amd/nba-sql/venv/lib/python3.9/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
peewee.OperationalError: (1061, "Duplicate key name 'shotchartdetailtemp_game_id'")
Here's a recipe you can follow if you want to populate a fresh DB with the table player_game_log
populated.
- Run the command:
DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --create-schema
- You’ll hit an error that says
Duplicate key name 'playergamelogtemp_player_id'
. Comment out the variableplayer_game_log_requester
from variableobject_list
inside functiondefault_mode
in the filestats/nba_sql.py
- Run the command from step 1 again
- You’ll hit an error saying
Duplicate key name 'playergamelogtemp_player_id'
. Comment out the variableshot_chart_requester
from variableobject_list
inside functiondefault_mode
in the filestats/nba_sql.py
- Comment out the variable
shot_chart_requester
from variableobject_list
inside functiondefault_mode
in the filestats/nba_sql.py
- Run the command:
DB_NAME="nba" DB_HOST="localhost" DB_USER=nba_sql DB_PASSWORD=nba_sql python stats/nba_sql.py --default-mode --database="mysql" --seasons 2021-22 --skip-tables play_by_play shot_chart_detail
I simply skipped play_by_play
and shot_chart_detail
as I'm not interested in that information.
Thank you for the repro steps. Have you tried the --skip-tables
option? Sounds like it might fit your use-case if you don't want to juggle comments.
It seems that we encounter the errors when it's creating the tables player_game_log
and shot_chart_detail
in a fresh DB. So I think if you don't want the two tables populated, you can simply use the --skip-tables command line argument. But if you do want the two tables, then I think you have to use the comment/uncomment method here. But the good news is that you only have to do it once as this error only occurs when it's creating the tables player_game_log
and shot_chart_detail
.