Out-of-bounds Read in BehaviorTree.CPP Script Validation
cktii opened this issue · 1 comments
When parsing invalid scripts that generate large error messages, the ValidateScript
function may attempt to create a string from a non-NULL-terminated buffer, causing strlen
to read beyond the buffer's bounds. This occurs because a fixed-size stack buffer is used to store error messages without ensuring proper NULL termination.
BehaviorTree.CPP/src/script_parser.cpp
Line 70 in b4bbef6
Found in commit: 48f6c5b
Bug Class
Out-of-bounds Read / Buffer Over-read (CWE-125)
Root Cause
In src/script_parser.cpp
, the ValidateScript
function allocates a fixed-size buffer of 2048 bytes for error messages:
Result ValidateScript(const std::string& script)
{
char error_msgs_buffer[2048];
auto input = lexy::string_input<lexy::utf8_encoding>(script);
auto result =
lexy::parse<BT::Grammar::stmt>(input, ErrorReport().to(error_msgs_buffer));
if(result.has_value() && result.error_count() == 0)
{
try
{
std::vector<BT::Ast::ExprBase::Ptr> exprs = LEXY_MOV(result).value();
if(exprs.empty())
{
return nonstd::make_unexpected("Empty Script");
}
// valid script
return {};
}
catch(std::runtime_error& err)
{
return nonstd::make_unexpected(err.what());
}
}
return nonstd::make_unexpected(error_msgs_buffer);
}
} // namespace BT
This buffer is passed to lexy's error reporter, which fills it with error messages. When parsing fails, the buffer content is used to construct a string via nonstd::make_unexpected
.
The string construction internally calls strlen
to determine the buffer length. However, there's no guarantee that:
- The buffer is NULL-terminated
- The error messages fit within the buffer
- The last byte is reserved for a NULL terminator
This leads to strlen
reading past the buffer bounds looking for a NULL terminator that isn't there.
Stack trace
==2543337==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffff5a09820 at pc 0x5555555ae3af bp 0x7fffffffcd10 sp 0x7fffffffc4d8
READ of size 2049 at 0x7ffff5a09820 thread T0
#0 0x5555555ae3ae in strlen (/home/user/BehaviorTree.CPP/build3/script_fuzzer+0x5a3ae) (BuildId: 390cd92534c04116bddfa049abd4832d7b583113)
#1 0x555555692d1c in std::char_traits<char>::length(char const*) /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/char_traits.h:399:9
#2 0x555555679d3c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&) /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/basic_string.h:648:30
#3 0x5555558ff01b in nonstd::expected_lite::expected<std::monostate, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>::expected<char*, 0>(nonstd::expected_lite::unexpected_type<char*>&&) /home/user/BehaviorTree.CPP/include/behaviortree_cpp/contrib/expected.hpp:1979:36
#4 0x5555558fe109 in BT::ValidateScript(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/user/BehaviorTree.CPP/src/script_parser.cpp:94:10
[...]
Reproducer
#include "behaviortree_cpp/scripting/script_parser.hpp"
#include <iostream>
int main()
{
std::string script = "+6e66>6666.6+66\r6>6;6e62=6+6e66>66666'; en';o';o'; en'; "
"\177n\177r;6.6+66.H>6+6\2006\036;@e66";
auto result = BT::ValidateScript(script);
if(!result)
{
std::cout << "Script validation failed as expected" << std::endl;
}
return 0;
}
NOTE: This requires that libbheaviortree.so
is compiled with -fsanitize=address
.
Proposed fix
I'm by no means an expert for lexy
, but given that lexy's parse function accepts a template callback for error handling:
/// Parses the production into a value, invoking the callback on error.
template <typename Production, typename Input, typename ErrorCallback>
constexpr auto parse(const Input& input, const ErrorCallback& callback)
{
return parse_action<void, Input, ErrorCallback>(callback)(Production{}, input);
}
A recommended fix seems to implement a safe error callback that manages its own memory instead of a fix-sized stack-based buffer, something similar to:
class SafeErrorReport {
std::string error_buffer;
std::size_t count = 0;
struct sink {
std::string& buffer;
std::size_t& count;
template <typename Input, typename Reader, typename Tag>
void operator()(const lexy::error_context<Input>& context,
const lexy::error<Reader, Tag>& error) {
buffer += "error: while parsing ";
buffer += context.production();
buffer += "\n";
count++;
}
std::size_t finish() && {
return count;
}
};
public:
auto sink() { return sink{error_buffer, count}; }
const std::string& get_errors() const { return error_buffer; }
};
This is based on my understanding of lexy's error_report.hpp. That would turn ValidateScript
into something like this:
Result ValidateScript(const std::string& script)
{
auto input = lexy::string_input<lexy::utf8_encoding>(script);
SafeErrorReport error_report; // Replace char buffer with our safe handler
auto result = lexy::parse<BT::Grammar::stmt>(input, error_report);
if(!result.has_value() || result.error_count() != 0)
{
return nonstd::make_unexpected(error_report.get_errors());
}
try
{
std::vector<BT::Ast::ExprBase::Ptr> exprs = LEXY_MOV(result).value();
if(exprs.empty())
{
return nonstd::make_unexpected("Empty Script");
}
// valid script
return {};
}
catch(std::runtime_error& err)
{
return nonstd::make_unexpected(err.what());
}
}
NOTE: Not 100% sure on this fix, just loosely based on my quick understand of lexy's code..
Impact
The vulnerability could allow an attacker to cause out-of-bounds memory reads by providing specially crafted invalid scripts. While this doesn't lead to direct code execution, it could potentially:
- Lead to information disclosure if sensitive data is stored adjacent to the buffer
- Cause program crashes (DoS)
Reproducer that doesn't require ASAN
#include "behaviortree_cpp/scripting/script_parser.hpp"
#include <iostream>
#include <string>
int main()
{
std::string script;
for(int i = 0; i < 10; i++)
{
script += "+6e66>6666.6+66\r6>6;6e62=6+6e66>66666'; en';o';o'; en'; "
"\177n\177r;6.6+66.H>6+6\2006\036;@e66";
}
volatile auto result = BT::ValidateScript(script);
return 0;
}
That first reproducer was a single-byte OOB-R and required the library to be compiled with AdressSanitizer. This second one crashes the normal Release
type library and causes a segmentation fault.