BloodHoundAD/AzureHound

azurehound doesn't truncate output file before writing

rfc6919 opened this issue · 3 comments

when azurehound writes to the file specified with --output it doesn't truncate before writing. if the original file was longer than the new output this leaves the old file content after the new json document, causing bloodhound import to hang during the "Uploading data" stage.

@rfc6919 I'm having some trouble reproducing this on my end. Can you provide the exact commands used to produce the json file? We'll also need the following:

  • AzureHound version
  • Operating System
  • Architecture

azurehound version:

AzureHound v1.2.0
Created by the BloodHound Enterprise team - https://bloodhoundenterprise.io

azurehound version v1.2.0

OS: macOS 12.6.1
Architecture: arm64

replication:
... get a devicelogin response in devicelogin.json ...
% azurehound -r "$(jq -r .refresh_token devicelogin.json)" list --tenant some-tenant.com --output test.json
test.json contains the response, legit JSON, ~2MB

% ls -l test.json 
-rw-r--r--  1 XXXXX  staff  2006100 16 Nov 10:41 test.json
% hexdump -C test.json | tail -5
001e9c20  22 6d 65 74 61 22 3a 20  7b 22 74 79 70 65 22 3a  |"meta": {"type":|
001e9c30  22 61 7a 75 72 65 22 2c  22 76 65 72 73 69 6f 6e  |"azure","version|
001e9c40  22 3a 35 2c 22 63 6f 75  6e 74 22 3a 31 37 33 36  |":5,"count":1736|
001e9c50  7d 0a 7d 0a                                       |}.}.|
001e9c54

overwrite test.json with 4MB of zeros

% dd if=/dev/zero of=test.json bs=1024 count=4096
4096+0 records in
4096+0 records out
4194304 bytes transferred in 0.008359 secs (501771025 bytes/sec)
% ls -l test.json                                     
-rw-r--r--  1 XXXXX  staff  4194304 16 Nov 10:48 test.json
% hexdump -C test.json | tail -5                 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400000

re-run the collection, with the now 4MB test.json still specified as the output file
% azurehound -r "$(jq -r .refresh_token devicelogin.json)" list --tenant some-tenant.com --output test.json
test.json is still 4MB, and after the actual content still has all the zeros that we dd'd into it

% ls -l test.json               
-rw-r--r--  1 XXXXX  staff  4194304 16 Nov 10:50 test.json
% hexdump -C test.json | tail -5
001e9c40  22 3a 35 2c 22 63 6f 75  6e 74 22 3a 31 37 33 36  |":5,"count":1736|
001e9c50  7d 0a 7d 0a 00 00 00 00  00 00 00 00 00 00 00 00  |}.}.............|
001e9c60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400000

it doesn't have to be zeros in the existing output file of course, that's just a simple demo. I assume the cause is sinks/file.go not including os.O_TRUNC in the flags passed to os.OpenFile().

🤦 Good catch! I added the flag and verified the fix in e00a40d