BayAreaMetro/travel-model-two

Memory allocation and Java crashes

saisirandas opened this issue · 2 comments

My apologies if this isn't correct place to ask this question. I couldn't find a user forum for TM2. Is there one?

I have a question about frequent "Java stopped working" crashes on TM2 model runs. They seems to occur during the second or third iteration of CTRAMP. Below are details of the model run and computing environment:

  • Select county 3 (Santa Clara)

  • Sample rate 1.0 for all TAZs

  • Windows Server 2016 Virtual Machine

  • Intel Xeon Gold 6134 CPU @ 3.20 GHz with 32 (virtual) cores

  • 512 GB RAM

Below is the event log message at the time of crash:

02-Apr-2021 09:29:34:640, ERROR, Exception exception making RMI method call: //10.1.0.80:1191/com.pb.mtctm2.abm.ctramp.MatrixDataServer.writeMatrixFile().
java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is:
java.net.SocketException: Connection reset
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:254)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:163)
at gnu.cajo.invoke.Remote_Stub.invoke(Unknown Source)
at gnu.cajo.invoke.Remote.invoke(Unknown Source)
at com.pb.mtctm2.abm.ctramp.UtilRmi.method(UtilRmi.java:123)
at com.pb.mtctm2.abm.ctramp.MatrixDataServerRmi.writeMatrixFile(MatrixDataServerRmi.java:41)
at com.pb.mtctm2.abm.application.MTCTM2TripTables.writeMatricesToFile(MTCTM2TripTables.java:514)
at com.pb.mtctm2.abm.application.MTCTM2TripTables.writeTrips(MTCTM2TripTables.java:495)
at com.pb.mtctm2.abm.application.MTCTM2TripTables.createTripTables(MTCTM2TripTables.java:275)
at com.pb.mtctm2.abm.application.MTCTM2TripTables.main(MTCTM2TripTables.java:687)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:240)
... 9 more

Is this some sort of memory/resource allocation issue with Java? The CTRAMP parameters are configured as follows. I listed ones that I thought were relevant.

runDriver.cmd
java -server -Xmx256m -cp "%CLASSPATH%" -Dlog4j.configuration=log4j-driver.properties -Djppf.config=jppf-driver.properties org.jppf.server.DriverLauncher

runMTCTM2ABM.cmd
java -server -Xmx130g -cp "%CLASSPATH%" -Dlog4j.configuration=log4j.xml -Dproject.folder=%PROJECT_DIRECTORY% -Djppf.config=jppf-clientLocal.properties com.pb.mtctm2.abm.application.MTCTM2TourBasedModel mtctm2 -iteration %iteration% -sampleRate %sampleRate% -sampleSeed 0

java -Xmx480g -cp "%CLASSPATH%" -Dproject.folder=%PROJECT_DIRECTORY% com.pb.mtctm2.abm.application.MTCTM2TripTables mtctm2 -iteration %iteration% -sampleRate %sampleRate%

runMtxMgr.cmd
START "Matrix Manager" %JAVA_PATH%\bin\java -Dname=p%HOST_MATRIX_PORT% -Xmx480g -cp "%CLASSPATH%" -Dlog4j.configuration=log4j_mtx.xml com.pb.mtctm2.abm.ctramp.MatrixDataServer -hostname %HOST_IP_ADDRESS% -port %HOST_MATRIX_PORT% -label "MTCTM2 Matrix Server"

runHhMgr.cmd
START "Household Manager" %JAVA_PATH%\bin\java -server -Xmx32g -cp "%CLASSPATH%" -Dlog4j.configuration=log4j_hh.xml com.pb.mtctm2.abm.application.SandagHouseholdDataManager2 -hostname %HOST_IP_ADDRESS% -port %HOST_PORT%

jppf-clientLocal.properties
jppf.local.execution.threads = 26

mtctm2.properties
distributed.task.packet.size = 500

lmz commented

Hi @saisirandas - There isn't a user forum for tm2, no, so this is a good place to post this. I'm not familiar with this error myself but I haven't been running TM2 recently although MTC staff will start getting back into TM2 development shortly so we could try to take a look.

Would you mind giving more context about what you're trying to do? Who is your client for this project?

Hi @lmz! Good to hear there is a place to ask questions.
For additional context, we are trying to run a version of the model for a private client in South Bay. We are pivoting off the previously developed TM2 for Marin county (TAMDM). In this current run, the only changes we've made from that version are to the select county (from Marin to Santa Clara) and the sampling rate.

I'm including a screenshot of the ctramp_output folder to indicate where the error occurs. It seems to occur when matrices start being written. In this case the error occurs in iteration 2.

After some additional testing and trying different combinations of memory allocations (Xmx) and number of threads/cores in jppf-clientLocal.properties, in my best attempt yet, the run gets past iteration 2 and crashes in iteration 3 (around the same time when matrices start being written).

Let me know if you have any advice for us or if there is any additional information I can provide. In the meantime, I'll look into if we can add more computing resources to the virtual machine to see if that helps.

Thanks!

image