Using the yaml-cpp Library
Forenoon watch, 7 bells (11:57 am)

I've been using the yaml-cpp library for a while now, and I have to say that overall I'm greatly pleased by how well it works.

As I was making some adjustments to my I/O last night I ran across a bit of documentation that I never noticed before. Of course, it could be newly added. Anyway, I've found the most convenient way to parse my documents (I have many thousands of them to read and write), is to access them by node-name. For example:

node["location"]["name"] >> mObjectLocation.location;

The information I read last night, however, may convince me to change this behavior.

Apparently, named-node access in yaml-cpp is order-n2 complex over the entire document. Put simply, every time you access a node like this the library loops through all nodes looking for a match. Personally, I think a hashed table lookup would have been much faster, and perhaps they'll add that later.

Overall, I wouldn't normally worry about it because my documents tend to only have a few dozen nodes. But I do have 20,000 documents to parse. And to top that off, as they are used they become more complex. And as more items are added, the node size will grow geometrically. I can easily envision a future where a single document contains hundreds of nodes, and it's starting to scare me.

The solution? Well, I can honestly say I don't have a good one yet. I think perhaps if I enforce the order things are written, which already only happens in one way, I can then read in linearly instead of by node-name. The problem with that is if a file is edited by hand and the order is accidentally changed. So by renaming my current solution to readCompatible() and writing a new linear read() function, I should cut down on a lot of extra unnecessary processing.

But I'm still not sure that's the best solution. Any smart people out there have a better idea?

Leave a Comment »