Missing of certificates due to chunk calculation logic
Closed this issue · 3 comments
Hi,
So this issue I know was in the old python server but as that is archived I can't open it against that tree. I've had a look at the Elixir server code (to the best of my ability) and can't see it resolved in there either so raising here.
Looking at the Google CT client gave me a sense of what needed doing in reality. I fixed a version of your original python server which I can share if needed.
In short:
- CaliDog server calculates a chunk size from the maximum available and rounds down as needed
- Requests that chunk size
- Moves on
The issue is:
- It doesn't check if it gets back the number of certificates it required
- and then increments its position whatever
In a harnessed version you can see larger trees than the values returned which results in missed certificates and at times quite a substantial volume/delta.
As such, and as Google does, the CaliDog client likely needs some logic along the lines of:
- Check the number returned and only move on by that many in the tree
- Loop around with that as the starting point and continue until the tree has been caught up
Here is my get_new_results rework to show how I implemented it (warning: I'm not a Python coder)
`
def get_new_results(log, latest_size, tree_size, Verify):
# The top of the tree isn't actually a cert yet, so the total_size is what we're aiming for
total_size = tree_size - latest_size
start = latest_size
end = start + MAX_BLOCK_SIZE - 1
chunks = math.ceil(total_size / MAX_BLOCK_SIZE)
logging.info("Retrieving {} certificates over {} chunks ({} -> {}) for {}".format(tree_size-latest_size, chunks, latest_size, tree_size, log['description']))
while start < tree_size:
for _ in range(chunks):
logging.info("Retrieving {} chunk {} of {}".format(_ , log['description'], chunks))
# Cap the end to the last record in the DB
if end >= tree_size:
oldend = end
newend = tree_size -1
if Verify:
logging.info("Truncating end from {} to {} because {} ".format(oldend,newend,tree_size))
end = tree_size - 1
assert end >= start, "End {} is less than start {}!".format(end, start)
assert end < tree_size, "End {} is less than tree_size {}".format(end, tree_size)
# Get the response
url = "https://{}/ct/v1/get-entries?start={}&end={}".format(log['url'], start, end)
if Verify:
logging.info("Getting {} certs from {} ".format(end-start,url))
if end-start == 0:
return
response = urllib.request.urlopen("{}".format(url))
data = response.read()
encoding = response.info().get_content_charset('utf-8')
certificates = json.loads(data.decode(encoding))
# Parse
try:
if 'error_message' in certificates:
logging.error("Encountered an exception while getting a chunk -> {}".format(certificates['error_message']))
if Verify:
logging.info("Resetting count")
count = 0
try:
for index, cert in zip(range(start, end), certificates['entries']):
cert['index'] = index
count = count + 1
except Exception as e:
logging.error("Encountered an exception assigning index -> {}".format(e))
if Verify:
logging.info("We got {} certs and expected {} for {}".format(count,end-start,url))
if Verify:
logging.error("Previous last size {} - new lastest size {} delta of {}".format(latest_size,end,end-latest_size))
last_size = start + count # instead of the calculated we bump it for this chunk to what we got
if Verify:
logging.error("Saving position to disk")
savelastsize(log['url'],last_size)
yield certificates['entries']
except Exception as e:
count = 0
logging.error("Encountered an exception while parsing a chunk -> {}".format(e))
# Move on
#start += MAX_BLOCK_SIZE
start = start + count
end = start + MAX_BLOCK_SIZE -1
#start = end
#end = start + MAX_BLOCK_SIZE - 1`
Hope it helps.
Cheers
Ollie
https://github.com/google/certificate-transparency/blob/master/python/ct/client/log_client.py is the Google CT client code of interest
Hi @olliencc
Interesting, thanks a ton for doing the legwork here to figure out what was going on. You're absolutely right in that we don't do any return payload entry checking. It looks like @ninoseki (also huge thanks to you) has submitted a PR that remedies this. I'll take a look tonight and assuming all looks good will merge!
Cheers,
Ryan