Environment for evaluating C#
Closed this issue · 13 comments
Hi there,
I'm trying to reproduce the results on C# (c-sharp, c#, cs). I'm running it on an Ubuntu virtual machine.
- I tried the recommended podman way, but I don't think it ran correctly (finished instantly, no score output, no other informative output).
- Then I choose to run it without a container, and it says
File "/export/home/project/codeai/MultiPL-E/evaluation/src/eval_cs.py", line 24, in eval_script
build = subprocess.run(["csc", "/d:DEBUG", "-r:System.Numerics.dll", path, f"/out:{binaryname}"], capture_output=True)
File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'csc'
AFAIK, csc won't work on linux machines. Can you share more details about how to configure C# running environment? Also, Toolchains page says Conda is required. Does conda provides anything for executing C#?
Thank you!
Rui
(Sorry, the Conda directions are out of date. Conda is not needed, since the container now works.)
Would you try to run our tests on your system:
cd evaluation
mkdir test_outputs
make test
That will run podman by default and produce a bunch of files in the test_outputs/
directory. If you look at the C# test:
cat test_outputs/cs-davinci-0.2-keep-HumanEval53.results.json
You should see:
{
"name": "HumanEval_53_add",
"language": "cs",
"prompt": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n",
"tests": " }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"stop_tokens": [
"\n }\n"
],
"results": [
{
"program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n return x + y;\n }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"timestamp": 1675463781,
"stdout": "",
"stderr": "None",
"exit_code": 0,
"status": "OK"
},
{
"program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n return x + ;\n }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"timestamp": 1675463780,
"stdout": "Microsoft (R) Visual C# Compiler version 3.9.0-6.21124.20 (db94f4cc)\nCopyright (C) Microsoft Corporation. All rights reserved.\n\n/tmp/tmpnv49tr82.cs(15,20): error CS1525: Invalid expression term ';'\n",
"stderr": "",
"exit_code": 1,
"status": "SyntaxError"
}
]
}
So, the first test will trivially pass, and the second is a deliberate syntax error in C#.
If you get anything else, there is some fundamental problem we should debug.
Thanks for the information!
Regarding the csc
issue, I found that using mono-csc
can make it work on Ubuntu (newbie to C#...) but I have to edit the command used in evaluation/src/eval_cs.py
.
As for the test you mentioned, it failed at STEP 21/29 due to Error 125.
WARN[0625] ignoring metacopy option from storage.conf, not supported with booted kernel
--> ba7bb2b4d18
STEP 21/29: RUN curl https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz | tar xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error: error building at STEP "RUN curl https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz | tar xz": error while running runtime: exit status 2
make: *** [Makefile:4: build] Error 125
Thank you!
Rui
Ah, I see the test command is setup to force the container to build. Would you edit the Makefile
so that test: build
is instead just test:
?
Then do this:
podman pull ghcr.io/nuprl/multipl-e-evaluation
podman tag ghcr.io/nuprl/multipl-e-evaluation multipl-e-eval
make test
That should address the problem.
The new command works!
But when evaluating C# results, I find that in output.stderr
it keeps printing the error below.
It seems to be resolved if I removed the "MONO_TRACE_LISTENER":"Console.Error"
in eval_cs.py L36. Do you see any problem here?
Fail:
at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0
Fail:
at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0
Fail:
at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0
Fail:
at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0
at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0
@abhijangda any ideas what this is about? IIRC you wrote this bit.
@memray I still don't think this is working as expected. Would you attach the file evaluation/test_outputs/cs-davinci-0.2-keep-HumanEval53.results.json
that you got after running make test
?
The current result looks normal since I have removed "MONO_TRACE_LISTENER":"Console.Error"
and rebuilt the container. Will keep an eye on it.
{
"name": "HumanEval_53_add",
"language": "cs",
"prompt": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n",
"tests": " }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"stop_tokens": [
"\n }\n"
],
"results": [
{
"program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n return x + y;\n }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"timestamp": 1675561391,
"stdout": "",
"stderr": "None",
"exit_code": 0,
"status": "OK"
},
{
"program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n // Add two numbers x and y\n // >>> add(2, 3)\n // 5\n // >>> add(5, 7)\n // 12\n public static long Add(long x, long y) {\n return x + ;\n }\n public static void Main(string[] args) {\n Debug.Assert(Add((0L), (1L)) == (1L));\n Debug.Assert(Add((1L), (0L)) == (1L));\n Debug.Assert(Add((2L), (3L)) == (5L));\n Debug.Assert(Add((5L), (7L)) == (12L));\n Debug.Assert(Add((7L), (5L)) == (12L));\n }\n\n}\n",
"timestamp": 1675561390,
"stdout": "Microsoft (R) Visual C# Compiler version 3.9.0-6.21124.20 (db94f4cc)\nCopyright (C) Microsoft Corporation. All rights reserved.\n\n/tmp/tmpoukx5fl9.cs(15,20): error CS1525: Invalid expression term ';'\n",
"stderr": "",
"exit_code": 1,
"status": "SyntaxError"
}
]
}
The env variable MONO_TRACE_LISTENER
is needed to make sure that mono runtime prints and exits on assertion. What is the Mono and Ubuntu version you are using?
Ubuntu 20.04.4 LTS and Mono JIT compiler version 6.12.0.182
Really not sure what causes this...
output.stderr
is filled when the assertions in the program fails, i.e., model generates code that fails one or more test cases. Mono unlike Java, Python, C++, continues executing even if the assertion fails. For example, compile following hello.cs
using System;
using System.Diagnostics;
public class HelloWorld
{
public static void Main(string[] args) {
Debug.Assert(false, "Bleh");
Console.WriteLine ("Executing after assertion");
}
}
Compile and run using:
csc -d:DEBUG hello.cs
MONO_TRACE_LISTENER=Console.Error mono hello.exe
You will see the output is:
Fail: Bleh
at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at System.Diagnostics.TraceInternal.Assert (System.Boolean condition, System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at System.Diagnostics.Debug.Assert (System.Boolean condition, System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0
at HelloWorld.Main (System.String[] args) [0x00000] in <d407b623a45a40d6920f37a2e64a2366>:0
Executing after assertion
Similarly, mono executes all test cases even one of them fails.
I hope that clears the situation.
Can you try to see the model generated code and see if Mono does report assertion failure for all test cases which the generated code does not pass?
I am genuinely surprised the container behaves differently. @memray if this doesn't get sorted out, I'd be happy to get on a video call and debug.
@arjunguha my apologies! The cs eval works well if running with podman.
I believe my issue was related to the environment, I was trying to run eval_cs.py with my own env but it didn't work out. Sorry about that.