Challenge

Hello, if you don't know what this is, it's not for you. If you do, hello.

I've setup a basic vagrant, because that way you can jump straight in and run my code without having to deal with any issues that arise from differences in development environments.

To use install vagrant (https://www.vagrantup.com/) and virtualbox

vagrant up --provider virtualbox

wait for provisioning to complete, then run the following to get a shell

vagrant ssh

now you can run the challenge code

cd /vagrant

#run the tests
carton exec -- prove -vl t/

#import xml from a url into the db
carton exec -- perl bin/import.pl --xml_url https://raw.githubusercontent.com/markwellis/challenge/master/xml/g0.xml  --verbose

#get some answers
cat json/simple.json | carton exec perl -I lib bin/search.pl

JSON queries

Since it's not clear, and I've added a "graph_id" to the incoming JSON request, so that it knows which graph to search

XML Parser

I've chosen XML::Twig for XML parsing, as it's simple to use but is based on Expat so it's well tested and can be relied upon to produce the correct results, if used correctly!

The interface can seem a little clunky at first, but the learning curve is low and overall this helps keep it maintainable.

Another option was XML::LibXML, but having used it before it's not something I'd chose to use again, and it seems like using a sledgehammer to crack a nut for this use case!

Normally I'd use XML::Simple, but it's use is now discouraged as it's not really a good way of working with XML. This is the first time I've use XML::Twig, and I found it did a good job quickly.

XML validation

I've done some validation, both when parsing and when converting to objects, but the task could be helped with a DTD in the XML document

JSON parsing

I've used JSON::MaybeXS for several reasons,

First off, it uses Cpanel::JSON::XS if available, which is an improved fork of JSON::XS and supported by Cpanel. Failing that it tries a number of fall backs, including JSON::XS

Second, it makes the api far more perlish, which is always nice as it means less to learn.

Third, it supports several JSON backends, so works on a wide range of systems (e.g no c compiler)

The "are there any paths" problem

I used a recursive depth first search because that's what came to mind when I saw the problem, and it's the usual way of traversing graphs.

At first I tried to do it none recursive, but it was really late at night and my brain was no longer working correctly, so I abandoned it half way through to go to sleep. Then the next day I reworked it to be more sensible.

I call the DFS from a helper sub and mangle the result into a more friendly format than the raw edge data which also contains the total cost of the path

The "cheapest path" problem

Since I already had the DFS return the path and costs, I use this and then reduce the data set to the one with the cheapest path.

I was thinking of using A* search/Dijkstra's algorithm, but I'd already wrote the paths function and it was much easier to use that!

I figured this would be ok, since in real life I'd never write graph code anyway, I'd use something like Graph because it's better tested and supported and I could get on with the real task at hand! I didn't for this test as I know you're after seeing how I'd do something, and understand the problem, and not my ability to read documentation!

SQL

In the sql/ dir you'll find "create.sql" which is the postgres schema I created

You will also find the "find_cycles.sql" query, which will list all cycles within a graph

General Design

I've done as much as possible as resuable modules, so that each part can be properly tested, and makes for cleaner scripts

There's a test suite located in t/