DIRACGrid/diracx

JobLoggingDB incompatible with MySQL

chrisburr opened this issue · 0 comments

Inserting a job with swagger when running with MySQL results in:

sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1093, "You can't specify target table 'LoggingInfo' for update in FROM clause")
[SQL: INSERT INTO `LoggingInfo` (`JobID`, `SeqNum`, `Status`, `MinorStatus`, `ApplicationStatus`, `StatusTime`, `StatusTimeOrder`, `StatusSource`) VALUES (%s, (SELECT coalesce(max(`LoggingInfo`.`SeqNum`) + %s, %s) AS coalesce_1
FROM `LoggingInfo`
WHERE `LoggingInfo`.`JobID` = %s), %s, %s, %s, %s, %s, %s)]
[parameters: (1, 1, 1, 1, <JobStatus.RECEIVED: 'Received'>, 'Job accepted', 'Unknown', datetime.datetime(2023, 9, 12, 9, 28, 25, 33384, tzinfo=datetime.timezone.utc), 424510905.0333841, 'JobManager')]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

I guess we need to separate the queries to stay MySQL compatible.

Explanation of problem

The error you're seeing is due to a MySQL limitation. In MySQL, you can't modify a table (in this case INSERT into LoggingInfo) and select from the same table in a subquery. Your seqnum_stmt is trying to do exactly that.

Here's the problematic subquery:

seqnum_stmt = (
    select(func.coalesce(func.max(LoggingInfo.SeqNum) + 1, 1))
    .where(LoggingInfo.JobID == job_id)
    .scalar_subquery()
)

This attempts to get the next sequence number for a given job_id by finding the maximum SeqNum for that job_id and adding 1 to it. If there is no record for the job_id, it defaults to 1.

There are a few potential ways to work around this limitation:

  1. Use a Temporary Table: One common way to work around this is to first select the needed data into a temporary table and then use that temporary table in the main query. However, this might be a bit of an overkill for your use case.

  2. Separate the Queries: Another approach is to separate the queries. First, run a query to fetch the maximum SeqNum for the given job_id and then run the INSERT query using the result.

Here's an example of how you can separate the queries:

# First, fetch the maximum SeqNum for the given job_id
seqnum_result = await self.conn.scalar(
    select(func.coalesce(func.max(LoggingInfo.SeqNum) + 1, 1)).where(LoggingInfo.JobID == job_id)
)
# Now, use the result in the INSERT statement
stmt = insert(LoggingInfo).values(
    JobID=int(job_id),
    SeqNum=seqnum_result,
    Status=status,
    MinorStatus=minor_status,
    ApplicationStatus=application_status[:255],
    StatusTime=date,
    StatusTimeOrder=epoc,
    StatusSource=source[:32],
)
await self.conn.execute(stmt)

By separating the queries, you avoid the MySQL limitation, and the operation should complete successfully.