【bug】? if we forget to add time mark code line in hf_ds folder
oujieww opened this issue · 2 comments
when i run code for benchmark of huggingface , i got error:
Traceback (most recent call last):
File "hf_opt.py", line 360, in
run_generation(args.model, args.batch_size, args.prompt_len, args.gen_len,
File "hf_opt.py", line 278, in run_generation
prefill_latency = costs[0]
IndexError: list index out of range
this costs is []
I find the timer is :
class _Timer:
"""An internal timer."""
def __init__(self, name: str):
self.name = name
self.started = False
self.start_time = None
# start-stop timestamp pairs
self.start_times = []
self.stop_times = []
self.costs = []
def start(self, sync_func: Callable = None):
"""Start the timer."""
assert not self.started, f"timer {self.name} has already been started."
if sync_func:
sync_func()
self.start_time = time.perf_counter()
self.start_times.append(self.start_time)
self.started = True
def stop(self, sync_func: Callable = None):
"""Stop the timer."""
assert self.started, f"timer {self.name} is not started."
if sync_func:
sync_func()
stop_time = time.perf_counter()
self.costs.append(stop_time - self.start_time)
self.stop_times.append(stop_time)
self.started = False
def reset(self):
"""Reset timer."""
self.started = False
self.start_time = None
self.start_times = []
self.stop_times = []
self.costs = []
def elapsed(self, mode: str = "average"):
"""Calculate the elapsed time."""
if not self.costs:
return 0.0
if mode == "average":
return sum(self.costs) / len(self.costs)
elif mode == "sum":
return sum(self.costs)
else:
raise RuntimeError("Supported mode is: average | sum")
should we change the "hf_opt.py" as follow;
Run
print("benchmark")
timers("generate-forward").reset()
timers("generate-forward").start()
generate_kwargs = dict(max_new_tokens=execute_gen_len, do_sample=False)
with torch.no_grad():
output_ids = model.generate(input_ids=input_ids, **generate_kwargs)
timers("generate-forward").stop()
costs = timers("generate-forward").costs
but I think the result I got is not right :
[2023-06-11 01:34:46,533] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
load model
wamup
benchmark
[0.8706504209985724]
<flexgen.timer._Timer object at 0x7fbdd3ff50d0>
Outputs:
0: Paris is the capital city of
15: Paris is the capital city of
model size: 2.443 GB cache size: 1.594 GB hidden size (p): 0.033 GB
peak gpu mem: 6.232 GB projected: False
prefill latency: 0.871 s prefill throughput: 9409.058 token/s
decode latency: 0.000 s decode throughput: 4960000000000.000 token/s
total latency: 0.871 s total throughput: 588.066 token/s
[2023-06-11 01:34:46,533] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
load model
wamup
benchmark
[0.8706504209985724]
<flexgen.timer._Timer object at 0x7fbdd3ff50d0>
model size: 2.443 GB cache size: 1.594 GB hidden size (p): 0.033 GB
peak gpu mem: 6.232 GB projected: False
prefill latency: 0.871 s prefill throughput: 9409.058 token/s
decode latency: 0.000 s decode throughput: 4960000000000.000 token/s
total latency: 0.871 s total throughput: 588.066 token/s
install ./third_package