openai/gpt-2-output-dataset

python vs. python3 in line 96 of /detector/server.py

AndrewBarfield opened this issue · 5 comments

This is a simple problem. Just posting so others are aware.

To get the Web-based GPT-2 Output Detector to work I had to change "python" to "python3" in line 96 of /detector/server.py. See:

num_workers = int(subprocess.check_output(['python', '-c', 'import torch; print(torch.cuda.device_count())']))

System:
OS: Ubuntu 19.10 eoan
Kernel: x86_64 Linux 5.3.0-19-generic
Uptime: 13d 6h 1m
Packages: 2125
Shell: bash 5.0.3
Resolution: 2560x1440
DE: GNOME
WM: GNOME Shell
WM Theme: Adwaita
GTK Theme: Yaru-dark [GTK2/3]
Icon Theme: Yaru
Font: Ubuntu 11
CPU: Intel Core i7-8809G @ 8x 4.2GHz [27.8°C]
GPU: AMD VEGAM (DRM 3.33.0, 5.3.0-19-generic, LLVM 9.0.0)
RAM: 6278MiB / 32035MiB

Behavior before the change:

~/Projects/AI/gpt-2-output-dataset/detector$ python3 -m server detector-large.pt
Loading checkpoint from detector-large.pt
Starting HTTP server on port 8080
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named torch
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/drew/Projects/AI/gpt-2-output-dataset/detector/server.py", line 120, in
fire.Fire(main)
File "/home/drew/.local/lib/python3.7/site-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/drew/.local/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/home/drew/.local/lib/python3.7/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/drew/Projects/AI/gpt-2-output-dataset/detector/server.py", line 96, in main
num_workers = int(subprocess.check_output(['python', '-c', 'import torch; print(torch.cuda.device_count())']))
File "/usr/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['python', '-c', 'import torch; print(torch.cuda.device_count())']' returned non-zero exit status 1.

Behavior after (is as expected)

~/Projects/AI/gpt-2-output-dataset/detector$ python3 -m server detector-large.pt
Loading checkpoint from detector-large.pt
Starting HTTP server on port 8080
[] Process has started; loading the model ...
[] Ready to serve
[] "GET / HTTP/1.1" 200 -
[] "GET /favicon.ico HTTP/1.1" 200 -
[] "GET /?This%20is%20an%20online%20demo%20of%20the%20GPT-2%20output%20detector%20model.%20Enter%20some%20text%20in%20the%20text%20box;%20the%20predicted%20probabilities%20will%20be%20displayed%20below.%20The%20results%20start%20to%20get%20reliable%20after%20around%2050%20tokens. HTTP/1.1" 200 -

Good catch - thanks! I'll replace that with sys.executable so that it is not dependent on the executable name.

@jongwook even when using sys.executable in case of virtualenv and aliases it will not work. On macOS you will get python, while I'm running with python3.
Unfortunately we cannot use argv[0] in this case...

Hmm.. I haven't thought about the case of virtualenv; in the conda environments that we're using the executable has always been python.

I assume you're not doing multi-GPU training since you're on a Mac, so you may simply use:

if torch.cuda.is_available():
   num_workers = int( torch.cuda.device_count() )

The whole subprocess fiddle was to avoid a CUDA error that may happen in multi-process multi-GPU training (see #13 for details).

@jongwook ah yes that one is a way, thank you!
Question. num_workers refers to gpu only? I mean, when in TF I can do like:

N_CPU = multiprocessing.cpu_count()
# OMP_NUM_THREADS controls MKL's intra-op parallelization
# Default to available physical cores
os.environ['OMP_NUM_THREADS'] = str( max(1, N_CPU) )
tf.ConfigProto(
                device_count={ 'GPU' : 1, 'CPU': N_CPU },
                intra_op_parallelism_threads = 0,
                inter_op_parallelism_threads = N_CPU,
                allow_soft_placement=True
            )
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.6

so that I can use at least 8 core parallelism on macOS, etc.

Yeah in single-node CPU training you shouldn't need to do multiprocessing, since the multithread capability from the OMP/MKL backend should be sufficient.