CaliDog/certstream-server

Missing of certificates due to chunk calculation logic

Closed this issue · 3 comments

Hi,

So this issue I know was in the old python server but as that is archived I can't open it against that tree. I've had a look at the Elixir server code (to the best of my ability) and can't see it resolved in there either so raising here.

Looking at the Google CT client gave me a sense of what needed doing in reality. I fixed a version of your original python server which I can share if needed.

In short:

  • CaliDog server calculates a chunk size from the maximum available and rounds down as needed
  • Requests that chunk size
  • Moves on

The issue is:

  • It doesn't check if it gets back the number of certificates it required
  • and then increments its position whatever

In a harnessed version you can see larger trees than the values returned which results in missed certificates and at times quite a substantial volume/delta.

As such, and as Google does, the CaliDog client likely needs some logic along the lines of:

  • Check the number returned and only move on by that many in the tree
  • Loop around with that as the starting point and continue until the tree has been caught up

Here is my get_new_results rework to show how I implemented it (warning: I'm not a Python coder)

`
def get_new_results(log, latest_size, tree_size, Verify):
# The top of the tree isn't actually a cert yet, so the total_size is what we're aiming for

total_size = tree_size - latest_size
start = latest_size

end = start + MAX_BLOCK_SIZE - 1 

chunks = math.ceil(total_size / MAX_BLOCK_SIZE)

logging.info("Retrieving {} certificates over {} chunks ({} -> {}) for {}".format(tree_size-latest_size, chunks, latest_size,  tree_size, log['description']))

while start < tree_size:

    for _ in range(chunks):
        logging.info("Retrieving {} chunk {} of {}".format(_ , log['description'], chunks))

        # Cap the end to the last record in the DB
        if end >= tree_size:
            oldend = end
            newend = tree_size -1             
            if Verify:
                logging.info("Truncating end from {} to {} because {} ".format(oldend,newend,tree_size))
            end = tree_size - 1

        assert end >= start, "End {} is less than start {}!".format(end, start)
        assert end < tree_size, "End {} is less than tree_size {}".format(end, tree_size)

        # Get the response
        url = "https://{}/ct/v1/get-entries?start={}&end={}".format(log['url'], start, end)
        
        if Verify:
            logging.info("Getting {} certs from {} ".format(end-start,url))
        if end-start == 0:
            return

        response = urllib.request.urlopen("{}".format(url))
        data = response.read()
        encoding = response.info().get_content_charset('utf-8')
        certificates = json.loads(data.decode(encoding))

        # Parse
        try:
            if 'error_message' in certificates:
                logging.error("Encountered an exception while getting a chunk -> {}".format(certificates['error_message']))

            if Verify:
                logging.info("Resetting count")
            
            count = 0

            try:
                for index, cert in zip(range(start, end), certificates['entries']):
                    cert['index'] = index
                    count = count + 1
            except Exception as e:
                logging.error("Encountered an exception assigning index -> {}".format(e))
        
            if Verify:
                logging.info("We got {} certs and expected {} for {}".format(count,end-start,url))
            
            if Verify:
                logging.error("Previous last size {} - new lastest size {} delta of {}".format(latest_size,end,end-latest_size))

            last_size = start + count # instead of the calculated we bump it for this chunk to what we got

            if Verify:
                logging.error("Saving position to disk")
            savelastsize(log['url'],last_size)

            yield certificates['entries']
        except Exception as e:
            count = 0
            logging.error("Encountered an exception while parsing a chunk -> {}".format(e))

        # Move on
        #start += MAX_BLOCK_SIZE
        start = start + count
        end = start + MAX_BLOCK_SIZE -1
        
        #start = end
        #end = start + MAX_BLOCK_SIZE - 1`

Hope it helps.

Cheers

Ollie

Hi @olliencc

Interesting, thanks a ton for doing the legwork here to figure out what was going on. You're absolutely right in that we don't do any return payload entry checking. It looks like @ninoseki (also huge thanks to you) has submitted a PR that remedies this. I'll take a look tonight and assuming all looks good will merge!

Cheers,
Ryan

Hello, I believe this is fixed with #24. Please let me know if you spot any other issues, and thanks for reaching out!