nuprl/MultiPL-E

Environment for evaluating C#

Closed this issue · 13 comments

memray commented

Hi there,

I'm trying to reproduce the results on C# (c-sharp, c#, cs). I'm running it on an Ubuntu virtual machine.

  1. I tried the recommended podman way, but I don't think it ran correctly (finished instantly, no score output, no other informative output).
  2. Then I choose to run it without a container, and it says
  File "/export/home/project/codeai/MultiPL-E/evaluation/src/eval_cs.py", line 24, in eval_script
    build = subprocess.run(["csc", "/d:DEBUG", "-r:System.Numerics.dll", path, f"/out:{binaryname}"], capture_output=True)
  File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/export/share/ruimeng/env/anaconda/envs/codegen/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'csc'

AFAIK, csc won't work on linux machines. Can you share more details about how to configure C# running environment? Also, Toolchains page says Conda is required. Does conda provides anything for executing C#?

Thank you!
Rui

(Sorry, the Conda directions are out of date. Conda is not needed, since the container now works.)

Would you try to run our tests on your system:

cd evaluation
mkdir test_outputs
make test

That will run podman by default and produce a bunch of files in the test_outputs/ directory. If you look at the C# test:

cat test_outputs/cs-davinci-0.2-keep-HumanEval53.results.json

You should see:

{
  "name": "HumanEval_53_add",
  "language": "cs",
  "prompt": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n",
  "tests": "    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
  "stop_tokens": [
    "\n    }\n"
  ],
  "results": [
    {
      "program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n        return x + y;\n    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
      "timestamp": 1675463781,
      "stdout": "",
      "stderr": "None",
      "exit_code": 0,
      "status": "OK"
    },
    {
      "program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n        return x + ;\n    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
      "timestamp": 1675463780,
      "stdout": "Microsoft (R) Visual C# Compiler version 3.9.0-6.21124.20 (db94f4cc)\nCopyright (C) Microsoft Corporation. All rights reserved.\n\n/tmp/tmpnv49tr82.cs(15,20): error CS1525: Invalid expression term ';'\n",
      "stderr": "",
      "exit_code": 1,
      "status": "SyntaxError"
    }
  ]
}

So, the first test will trivially pass, and the second is a deliberate syntax error in C#.

If you get anything else, there is some fundamental problem we should debug.

memray commented

Thanks for the information!
Regarding the csc issue, I found that using mono-csc can make it work on Ubuntu (newbie to C#...) but I have to edit the command used in evaluation/src/eval_cs.py.

As for the test you mentioned, it failed at STEP 21/29 due to Error 125.

WARN[0625] ignoring metacopy option from storage.conf, not supported with booted kernel
--> ba7bb2b4d18
STEP 21/29: RUN curl https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz | tar xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error: error building at STEP "RUN curl https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz | tar xz": error while running runtime: exit status 2
make: *** [Makefile:4: build] Error 125

Thank you!
Rui

Ah, I see the test command is setup to force the container to build. Would you edit the Makefile so that test: build is instead just test:?

Then do this:

podman pull ghcr.io/nuprl/multipl-e-evaluation
podman tag ghcr.io/nuprl/multipl-e-evaluation multipl-e-eval
make test

That should address the problem.

memray commented

The new command works!
But when evaluating C# results, I find that in output.stderr it keeps printing the error below.
It seems to be resolved if I removed the "MONO_TRACE_LISTENER":"Console.Error" in eval_cs.py L36. Do you see any problem here?

Fail: 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0 
Fail: 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0 
Fail: 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0 
Fail: 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.TraceInternal.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at System.Diagnostics.Debug.Assert (System.Boolean condition) [0x00000] in <a85c1a570f9a4f9f9c3d2cfa5504e34f>:0 
  at Problem.Main (System.String[] args) [0x00000] in <54b5ef6a134b4acbaecf85396ecb47c9>:0 

@abhijangda any ideas what this is about? IIRC you wrote this bit.

@memray I still don't think this is working as expected. Would you attach the file evaluation/test_outputs/cs-davinci-0.2-keep-HumanEval53.results.json that you got after running make test?

memray commented

The current result looks normal since I have removed "MONO_TRACE_LISTENER":"Console.Error" and rebuilt the container. Will keep an eye on it.

{
  "name": "HumanEval_53_add",
  "language": "cs",
  "prompt": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n",
  "tests": "    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
  "stop_tokens": [
    "\n    }\n"
  ],
  "results": [
    {
      "program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n        return x + y;\n    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
      "timestamp": 1675561391,
      "stdout": "",
      "stderr": "None",
      "exit_code": 0,
      "status": "OK"
    },
    {
      "program": "using System;\nusing System.Numerics;\nusing System.Diagnostics;\nusing System.Collections.Generic;\nusing System.Linq;\nusing System.Text;\nusing System.Security.Cryptography;\nclass Problem {\n    // Add two numbers x and y\n    // >>> add(2, 3)\n    // 5\n    // >>> add(5, 7)\n    // 12\n    public static long Add(long x, long y) {\n        return x + ;\n    }\n    public static void Main(string[] args) {\n    Debug.Assert(Add((0L), (1L)) == (1L));\n    Debug.Assert(Add((1L), (0L)) == (1L));\n    Debug.Assert(Add((2L), (3L)) == (5L));\n    Debug.Assert(Add((5L), (7L)) == (12L));\n    Debug.Assert(Add((7L), (5L)) == (12L));\n    }\n\n}\n",
      "timestamp": 1675561390,
      "stdout": "Microsoft (R) Visual C# Compiler version 3.9.0-6.21124.20 (db94f4cc)\nCopyright (C) Microsoft Corporation. All rights reserved.\n\n/tmp/tmpoukx5fl9.cs(15,20): error CS1525: Invalid expression term ';'\n",
      "stderr": "",
      "exit_code": 1,
      "status": "SyntaxError"
    }
  ]
}

The env variable MONO_TRACE_LISTENER is needed to make sure that mono runtime prints and exits on assertion. What is the Mono and Ubuntu version you are using?

memray commented

Ubuntu 20.04.4 LTS and Mono JIT compiler version 6.12.0.182
Really not sure what causes this...

output.stderr is filled when the assertions in the program fails, i.e., model generates code that fails one or more test cases. Mono unlike Java, Python, C++, continues executing even if the assertion fails. For example, compile following hello.cs

using System;
using System.Diagnostics;

public class HelloWorld
{
    public static void Main(string[] args)  {            
        Debug.Assert(false, "Bleh");
        Console.WriteLine ("Executing after assertion");
    }
}

Compile and run using:

csc -d:DEBUG hello.cs
MONO_TRACE_LISTENER=Console.Error mono hello.exe

You will see the output is:

Fail: Bleh
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message, System.String detailMessage) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at System.Diagnostics.TraceListener.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at System.Diagnostics.DefaultTraceListener.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at System.Diagnostics.TraceInternal.Fail (System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at System.Diagnostics.TraceInternal.Assert (System.Boolean condition, System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at System.Diagnostics.Debug.Assert (System.Boolean condition, System.String message) [0x00000] in <33b19a7ad5234d94abf4fd9b47566616>:0 
  at HelloWorld.Main (System.String[] args) [0x00000] in <d407b623a45a40d6920f37a2e64a2366>:0 
Executing after assertion

Similarly, mono executes all test cases even one of them fails.

I hope that clears the situation.

Can you try to see the model generated code and see if Mono does report assertion failure for all test cases which the generated code does not pass?

I am genuinely surprised the container behaves differently. @memray if this doesn't get sorted out, I'd be happy to get on a video call and debug.

Just checking on this @memray -- was it resolved?

memray commented

@arjunguha my apologies! The cs eval works well if running with podman.
I believe my issue was related to the environment, I was trying to run eval_cs.py with my own env but it didn't work out. Sorry about that.