rossengeorgiev/vdf-parser

python dictionaries don't preserve order, support duplicate keys

brand1417 opened this issue · 5 comments

Using the following

import vdf

f = vdf.parse(open('npc_heroes.txt'))

print vdf.dump(f['DOTAHeroes']['npc_dota_hero_zuus']['Bot'],pretty=True)

You can see that the output, specifically for Build/Loadout, is out of order vs the .txt file and duplicated items (e.g... Blades of Attack, used for Phase Boots) are omitted. This likely affects Javascript in a similar way, though I believe PHP handles this a bit better and should be fine.

Let me know if this is something you're working on or if I should take a crack at it and submit a pull.

Thanks!

Interesting. I wasn't aware of the use of duplicates keys in VDF files. The order can easily be sorted out by using OrderedDict instead, but the duplicates pose a challenge.

One way to would be to have the values of duplicates be a list. That would ruin the order if they are not one after another. However, the biggest drawback would be using the structure. You have to explicitly handle whenever a value is string or list. That seems like a messy approach. Perhaps it's better to detect collision when parsing the file, and append an index number resulting in a dict like this:

{
"item_tango": "ITEM_CONSUMABLE | ITEM_SELLABLE"
"item_tango1": "ITEM_CONSUMABLE | ITEM_SELLABLE"
"item_tango2": "ITEM_CONSUMABLE | ITEM_SELLABLE"
}

This seems like the most elegant option. Since duplicate keys are not common, they can be a special case, which leaves it to the application to handle. I don't see any other alternatives, but I happy to hear any suggestions.

I was comparing the Python version to the PHP version, which appears to function as expected. PHP var_dump shows that it uses arrays of arrays, which both preserves order and allows for duplicate entries. Would something like that work for Python?

require_once('vdf.php');

var_dump(vdf_decode(file_get_contents('npc_heroes.txt'))["DOTAHeroes"]["npc_dota_hero_zuus"]['Bot']);

EDIT: sorry about that. I just double-checked and the array does not allow for duplicates. So, yeah, possibly a list then.

If we want to fix it for all languages then we need to make a hack. In my opinion, appending and index to the colliding key, is the best approach. It will work on all languages. We could then make a function that when given a key will look for the duplicates and return a list. In python it is trivial to extend say OrderedDict. The only downside is that if you want convert the VDF to JSON, you will get key, key1, key2 etc.

These issues have been addressed in v1.4 of the python vdf module, by being able to set a custom mapper class. This doesn't solve the issue with duplicate keys, but allows for using custom mapper that can handle that particular case.

Here is an example implementation in Python for handling duplicate keys and preserving order:

https://github.com/rossengeorgiev/dota2_notebooks/blob/master/DuplicateOrderedDict_for_VDF.ipynb