SymbiFlow/uxsdcxx

Add support for reading / writing compressed files

Opened this issue · 5 comments

duck2 commented

I did an experiment with zstr streams:

void get_root_elements(const char *filename){
	pugi::xml_document doc;
	pugi::xml_parse_result result;

	std::string x(filename);
	if(x.rfind(".") != std::string::npos && x.substr(x.rfind(".")+1) == "gz"){
		std::ifstream F;
		F.open(x);
		zstr::istream Z(F);
		result = doc.load(Z);
	} else {
		result = doc.load_file(filename);
	}

	if(!result)
		throw std::runtime_error("Could not load XML file " + std::string(filename) + ".");
	for(pugi::xml_node node= doc.first_child(); node; node = node.next_sibling()){
		if(std::strcmp(node.name(), "rr_graph") == 0){
			count_rr_graph(node);
			alloc_arenas();
			load_rr_graph(node, &rr_graph);
		}
		else throw std::runtime_error("Invalid root-level element " + std::string(node.name()));
	}
}

Artix 7 rr_graph run with uncompressed file(922 MB)(without errno checking after strtol calls):
7.645 8.097 7.600 7.636 7.677

With gzip-compressed file:
11.34 11.15 11.10 11.29 11.13

Is that with or without a hot disk cache? Can you try flushing that?

duck2 commented

It's with a hot disk cache. Without the file in the cache, the reading time can jump to 11 seconds or so.

@duck2 - How does the time between with gzip and without gzip compare without file in the disk cache?

Once SAX parsing support is complete (#3), a compressed one pass SAX parser may be a good compromise between CPU/disk/memory usage. Unclear if a two pass SAX + compression would have good numbers.