PrincetonUniversity/athena

Command-line output#/dt and output#/next_time are handled incorrectly

EdwinChan opened this issue · 4 comments

Prerequisite checklist

Place an X in between the brackets on each line as you complete these checks:

  • Did you check that the issue hasn't already been reported?
  • Did you check the documentation in the Wiki for an answer?
  • Are you running the latest version of Athena++?

Summary of issue

When output#/dt and output#/next_time are specified during a restart, fix-ups applied to output times (see #310) are omitted when these two parameters are on the command line rather than in an input file.

Steps to reproduce

We use two input files athinput-time1 and athinput-time2 containing, among others, the following lines:

<job>
problem_id = time
<time>
tlim = 10
<output1>
file_type = rst
dt = 1
<job>
problem_id = time
<time>
tlim = 10
<output1>
file_type = rst
dt = 0.5
next_time = 1.8

Expected and actual outcomes

If we run Athena++ with

mkdir time{1,2}
bin/athena -d time1 -i athinput-time1
bin/athena -d time2 -i athinput-time2 -r time1/time.00005.rst

and we read the binary timestamps off the restart files, we get

time1/time.00000.rst 0.0
time1/time.00001.rst 1.0000092775033713
time1/time.00002.rst 2.0003549602099246
time1/time.00003.rst 3.0008885306036044
time1/time.00004.rst 4.000559354421319
time1/time.00005.rst 5.000100787631505
time1/time.00006.rst 6.000578975742493
time1/time.00007.rst 7.001832959788023
time1/time.00008.rst 8.000395785266205
time1/time.00009.rst 9.001636920101932
time1/time.final.rst 10.0
time2/time.00006.rst 5.300735126408735
time2/time.00007.rst 5.8014517407979715
time2/time.00008.rst 6.300726294886482
time2/time.00009.rst 6.801709105815053
time2/time.00010.rst 7.301515963883339
time2/time.00011.rst 7.800602370859581
time2/time.00012.rst 8.30158045147817
time2/time.00013.rst 8.801768346195535
time2/time.00014.rst 9.30166771908827
time2/time.00015.rst 9.800298307581457
time2/time.final.rst 10.0

That is to say, the next output after the restart happens at time next_time + n * dt that is later than the restart time for some non-negative integer n. I would argue the output should happen as soon as possible if next_time is earlier than the restart time, because that would give us an easy way to make a debug dump right at the point of restart, which would be useful if the restart file has been modified externally, or if the restart process is customized to do more than just reading in the data, but that parameter interpretation is up for debate.

A more serious problem arises if we then run

rm time2/*
bin/athena -d time2 -r time1/time.00005.rst output1/dt=0.5 output1/next_time=1.8

The timestamps are now

time2/time.00006.rst 5.002184280517581
time2/time.00007.rst 5.004198237047655
time2/time.00008.rst 5.006251402013058
time2/time.00009.rst 5.008233401549724
time2/time.00010.rst 5.010171844156113
time2/time.00011.rst 5.012137908737955
time2/time.00012.rst 5.0140426303780625
time2/time.00013.rst 5.300735126408735
time2/time.00014.rst 5.8014517407979715
time2/time.00015.rst 6.300726294886482
time2/time.00016.rst 6.801709105815053
time2/time.00017.rst 7.301515963883339
time2/time.00018.rst 7.800602370859581
time2/time.00019.rst 8.30158045147817
time2/time.00020.rst 8.801768346195535
time2/time.00021.rst 9.30166771908827
time2/time.00022.rst 9.800298307581457
time2/time.final.rst 10.0

Output happens and next_time += dt every cycle until next_time exceeds the simulation time. The utility of this catch-up is almost zilch, unless the user explicitly wants next_time == file_number * dt. By the way, output#/file_number actually refers to the next output, in the same way next_time does; that is not documented in the wiki.

However, if we run:

rm time2/*
bin/athena -d time2 -i athinput-time2 -r time1/time.00005.rst output1/dt=0.7 output1/next_time=1.9

The timestamps are normal:

time2/time.00006.rst 5.40105071040028
time2/time.00007.rst 6.100257532253666
time2/time.00008.rst 6.801709105815053
time2/time.00009.rst 7.500868906510636
time2/time.00010.rst 8.201493737794491
time2/time.00011.rst 8.901415861600112
time2/time.00012.rst 9.601331998722616
time2/time.final.rst 10.0

Note that the parameters on the command line are changed to demonstrate that they do override the ones in the input file. The order of command-line arguments is immaterial.

The examples here use restart files for simplicity, but the same bug should apply to other forms of output as well.

Additional comments and/or proposed solution

There have been multiple issues addressing time-step issues: #36, #62, #116, #310. #62 mentions the function ParameterInput::ForwardNextTime(); it seems that this function and its friend ParameterInput::RollbackNextTime() are called only when an input file is supplied:

athena/src/main.cpp

Lines 229 to 231 in 5871884

// If both -r and -i are specified, make sure next_time gets corrected.
// This needs to be corrected on the restart file because we need the old dt.
if (iarg_flag == 1) pinput->RollbackNextTime();

athena/src/main.cpp

Lines 297 to 301 in 5871884

// With current mesh time possibly read from restart file, correct next_time for outputs
if (iarg_flag == 1 && res_flag == 1) {
// if both -r and -i are specified, ensure that next_time >= mesh_time - dt
pinput->ForwardNextTime(pmesh->time);
}

Version info

  • Athena++ version: v21.0-dev-121-gae02dd19
  • Compiler and version: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
  • Operating system: Ubuntu 20.04.4 LTS
  • Hardware and cluster name (if applicable):
  • External library versions (if applicable): openmpi-bin:amd64 (4.0.3-0ubuntu1), libhdf5-openmpi-dev:amd64 (1.10.4+repack-11ubuntu1)

By the way, output#/file_number actually refers to the next output, in the same way next_time does; that is not documented in the wiki.

This isn't exposed as a user-controlled option for input files or the command line, I believe. What do you suggest we document about it in the Wiki?

By the way, how are you reading the binary timestamps of the .rst files? I cant remember if there is a convenient command line option to do so.

If output#/next_time is documented, why not output#/file_number? We can simply say it overrides the numbering of the next output, i.e., the description can mirror output#/next_time. A user carrying out a two-part simulation may want to restart the count for the second part.

The time is saved in a simple binary format in the restart files, so it can be easily read with, e.g., the Python package struct. It'd be great if the user could override the time as well, for the same reason as above, but that'd be another issue.

By the way, how are you reading the binary timestamps of the .rst files? I cant remember if there is a convenient command line option to do so.

This was never robust enough (endianness, padding bytes, etc.) to make it to the codebase, but an example of using struct to read restart metadata is here in case anyone wants it: https://www.physics.unlv.edu/astro/athena2019/scripts/restart_inspect.py

FYI for future readers, the script https://www.physics.unlv.edu/astro/athena2019/scripts/restart_inspect.py is only compatible with Python 2, not Python 3