facebookresearch/ReAgent

Tutorial not work

galoisking opened this issue · 1 comments

follow https://reagent.ai/rasp_tutorial.html#installing-reagent ,

./reagent/workflow/cli.py run reagent.workflow.training.identify_and_train_network "$CONFIG"

/home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
input.shape == input_presence_byte.shape
/home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:589: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
elif max_value.item() > MAX_FEATURE_VALUE:
/home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:594: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
elif min_value.item() < MIN_FEATURE_VALUE:
I0721 100356.023 preprocessor.py:37] CUDA availability: False
I0721 100356.023 preprocessor.py:45] NOT Using GPU: GPU not requested or not available.
/home/circleci/project/ReAgent/reagent/prediction/predictor_wrapper.py:193: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert q_values.shape[1] == 2, f"{q_values.shape}"
I0721 100356.088 training.py:269] Saved default_model to DiscreteDQN_default_model_1626861836.torchscript
I0721 100356.090 training.py:269] Saved binary_difference_scorer to DiscreteDQN_binary_difference_scorer_1626861836.torchscript

(base) circleci@e79b99c2c4f9:/project/ReAgent$ mkdir -p /tmp/0
(base) circleci@e79b99c2c4f9:
/project/ReAgent$ cp model_.torchscript /tmp/0/0

(base) circleci@e79b99c2c4f9:~/project/ReAgent$ python serving/examples/ecommerce/customer_simulator.py contextual_bandit.json
0
200
400
600
800
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "serving/examples/ecommerce/customer_simulator.py", line 49, in serve_customer
result = post(
File "serving/examples/ecommerce/customer_simulator.py", line 24, in post
response = urllib.request.urlopen(req, jsondataasbytes)
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 1379, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open
r = h.getresponse()
File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "serving/examples/ecommerce/customer_simulator.py", line 83, in
results: List[Tuple[str, float]] = p.map(serve_customer, list(range(EPOCHS)))
File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
http.client.RemoteDisconnected: Remote end closed connection without response
[1]+ Aborted (core dumped) nohup ./serving/build/RaspCli --logtostderr > cli.log

(base) circleci@e79b99c2c4f9:~/project/ReAgent$ cat cli.log
I0721 10:05:05.707381 9778 DiskConfigProvider.cpp:9] READING CONFIGS FROM serving/examples/ecommerce/plans
I0721 10:05:05.707865 9778 DiskConfigProvider.cpp:48] GOT CONFIG contextual_bandit.json AT serving/examples/ecommerce/plans/contextual_bandit.json
I0721 10:05:05.707962 9778 DiskConfigProvider.cpp:52] Registered decision config: contextual_bandit.json
I0721 10:05:05.708199 9778 DiskConfigProvider.cpp:48] GOT CONFIG heuristic.json AT serving/examples/ecommerce/plans/heuristic.json
I0721 10:05:05.708250 9778 DiskConfigProvider.cpp:52] Registered decision config: heuristic.json
I0721 10:05:05.708446 9778 DiskConfigProvider.cpp:48] GOT CONFIG multi_armed_bandit.json AT serving/examples/ecommerce/plans/multi_armed_bandit.json
I0721 10:05:05.708492 9778 DiskConfigProvider.cpp:52] Registered decision config: multi_armed_bandit.json
I0721 10:05:05.708657 9787 Server.cpp:58] STARTING SERVER
[F PytorchActionValueScorer.cpp:74] TORCH ERROR: forward() Expected a value of type 'torch.reagent.core.types.ServingFeatureData' for argument 'state' but instead found type 'Tuple[Tensor, Tensor]'.
Position: 1
Declaration: forward(torch.reagent.prediction.predictor_wrapper.DiscreteDqnPredictorWrapper self, torch.reagent.core.types.ServingFeatureData state) -> ((str[], Tensor))
Exception raised from checkArg at ../aten/src/ATen/core/function_schema_inl.h:162 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7ff39a7067eb in /home/circleci/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xce (0x7ff39a70246e in /home/circleci/libtorch/lib/libc10.so)
frame #2: + 0x10194a2 (0x7ff385e5d4a2 in /home/circleci/libtorch/lib/libtorch_cpu.so)
frame #3: + 0x101d731 (0x7ff385e61731 in /home/circleci/libtorch/lib/libtorch_cpu.so)
frame #4: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x2d (0x7ff388703e3d in /home/circleci/libtorch/lib/libtorch_cpu.so)
frame #5: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x161 (0x7ff388713eb1 in /home/circleci/libtorch/lib/libtorch_cpu.so)
frame #6: torch::jit::Module::forward(std::vector<c10::IValue, std::allocatorc10::IValue >) + 0x10c (0x7ff399f4540a in ./serving/build/RaspCli)
frame #7: reagent::PytorchActionValueScorer::predict[abi:cxx11](reagent::DecisionRequest const&, int, int) + 0x927 (0x7ff399f413ff in ./serving/build/RaspCli)
frame #8: reagent::ActionValueScoring::runInternal[abi:cxx11](int, int, reagent::DecisionRequest const&) + 0x5c (0x7ff39a28af52 in ./serving/build/RaspCli)
frame #9: reagent::ActionValueScoring::run(reagent::DecisionRequest const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::variant<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, long> > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > > > > >, std::vector<reagent::ActionDetails, std::allocatorreagent::ActionDetails > >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::variant<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, long> > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > > > > >, std::vector<reagent::ActionDetails, std::allocatorreagent::ActionDetails > > > > > const&) + 0x133 (0x7ff39a28ae39 in ./serving/build/RaspCli)
frame #10: + 0xd84fe6 (0x7ff39a24efe6 in ./serving/build/RaspCli)
frame #11: + 0xd871a0 (0x7ff39a2511a0 in ./serving/build/RaspCli)
frame #12: std::function<void ()>::operator()() const + 0x32 (0x7ff39a25bce2 in ./serving/build/RaspCli)
frame #13: void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) + 0x20 (0x7ff39a258da8 in ./serving/build/RaspCli)
frame #14: std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) + 0x26 (0x7ff39a256723 in ./serving/build/RaspCli)
frame #15: std::invoke_result<std::function<void ()>&>::type std::invoke<std::function<void ()>&>(std::function<void ()>&) + 0x20 (0x7ff39a254c2d in ./serving/build/RaspCli)
frame #16: tf::Executor::_invoke_static_work(unsigned int, tf::Node*) + 0xf3 (0x7ff39a27ef37 in ./serving/build/RaspCli)
frame #17: tf::Executor::_invoke(unsigned int, tf::Node*) + 0x11b (0x7ff39a27e8ef in ./serving/build/RaspCli)
frame #18: tf::Executor::_exploit_task(unsigned int, std::optionaltf::Node*&) + 0x12e (0x7ff39a27e036 in ./serving/build/RaspCli)
frame #19: tf::Executor::_spawn(unsigned int)::{lambda()#1}::operator()() const + 0x78 (0x7ff39a27dbba in ./serving/build/RaspCli)
frame #20: void std::__invoke_impl<void, tf::Executor::_spawn(unsigned int)::{lambda()#1}>(std::__invoke_other, tf::Executor::_spawn(unsigned int)::{lambda()#1}&&) + 0x20 (0x7ff39a283f02 in ./serving/build/RaspCli)
frame #21: std::__invoke_result<tf::Executor::_spawn(unsigned int)::{lambda()#1}>::type std::__invoke<tf::Executor::_spawn(unsigned int)::{lambda()#1}>(std::__invoke_result&&, (tf::Executor::_spawn(unsigned int)::{lambda()#1}&&)...) + 0x26 (0x7ff39a283233 in ./serving/build/RaspCli)
frame #22: decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) + 0x28 (0x7ff39a285528 in ./serving/build/RaspCli)
frame #23: std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> >::operator()() + 0x1d (0x7ff39a28548f in ./serving/build/RaspCli)
frame #24: std::thread::_State_impl<std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> > >::_M_run() + 0x1c (0x7ff39a28542e in ./serving/build/RaspCli)
frame #25: + 0xc819d (0x7ff39941b19d in /home/circleci/miniconda/lib/libstdc++.so.6)
frame #26: + 0x76db (0x7ff384c2c6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #27: clone + 0x3f (0x7ff3843b288f in /lib/x86_64-linux-gnu/libc.so.6)

I'm getting the same error running through the tutorial. When I get to the customer_simulator.py step and it goes to post to RASP and score the prediction, it prints this error in the logs:

[F PytorchActionValueScorer.cpp:75] TORCH ERROR: forward() Expected a value of type '__torch__.reagent.core.types.ServingFeatureData' for argument 'state' but instead found type 'Tuple[Tensor, Tensor]'.
Position: 1
Declaration: forward(__torch__.reagent.prediction.predictor_wrapper.DiscreteDqnPredictorWrapper self, __torch__.reagent.core.types.ServingFeatureData state) -> ((str[], Tensor))
Exception raised from checkArg at ../aten/src/ATen/core/function_schema_inl.h:162 (most recent call first)

I've traced the error down to model.forward(inputs) here: https://github.com/facebookresearch/ReAgent/blob/master/serving/reagent/serving/core/PytorchActionValueScorer.cpp#L50
Maybe the request for the state features in the example needs to be changed somehow?