This is a command-line utility designed to address the specifictions outlined in code_sample.md.
It accepts a CSV file as input and returns either Invalid Address
or a corrected/standardized address obtained via API call to a provider.
The following tools must be installed on your system:
node
- https://nodejs.org/en/yarn
- https://yarnpkg.com/- Note: You may substitute
npm
for yarn, butpackage-lock.json
is not provided. There is a risk that the verions of packages will not be an exact match for those specified withinyarn.lock.
- Note: You may substitute
To install the nececessary dependencies, run yarn
from the root of your git clone.
This tool accepts as CSV file as input. The format of that file is expected to be:
Street Address, City, Postal Code
123 e Maine Street, Columbus, 43215
See also: code_sample
To see the available CLI arguements, run
node ./index.js --help
The output will resemble
Options:
-i, --input Input source
[string] [choices: "stdin", "csv"] [default: "stdin"]
-f, --file Input file. Required when --input=csv [string]
-p, --provider Provider to use
[string] [choices: "byteplant"] [default: "byteplant"]
-l, --log-level Log level
[string] [choices: "trace", "debug", "info", "warn", "error"] [default:
"warn"]
-k, --api-key-env-var Environment variable holding the provider's API key
[string] [required]
-h, --help Show help [boolean]
As you can see in the above output, several inputs are required.
Additionally, certain combinations (for example --input stdin --file sample.csv
) are invalid.
If you fail to supply the required arguments, or you supply an invalid combination of arguments, you will be greeted with an error message.
Here's an example error response where --api-key-env-var
(alias -k
) was not specified:
% cat sample.csv | node ./index.js --input stdin
Options:
-i, --input Input source
[string] [choices: "stdin", "csv"] [default: "stdin"]
-f, --file Input file. Required when --input=csv [string]
-p, --provider Provider to use
[string] [choices: "byteplant"] [default: "byteplant"]
-l, --log-level Log level
[string] [choices: "trace", "debug", "info", "warn", "error"] [default:
"warn"]
-k, --api-key-env-var Environment variable holding the provider's API key
[string] [required]
-h, --help Show help [boolean]
Missing required argument: api-key-env-var
To supply an API key into the program, the --api-key-env-var
(alias -k
) is utilized.
Some of the tests expect this environment variable to be BYTEPLANT_KEY
(see Testing -> Requirements), but for non-test usage you may utilize any valid environment variable.
Here are some example methods by which you can choose to run the utility:
# CSV file example usage
node ./index.js --api-key-env-var BYTEPLANT_KEY --input csv --file ./sample.csv
# stdin example usage, verbose
cat sample.csv | node ./index.js -k BYTEPLANT_KEY -i stdin
# stdin example usage, simplified
# The "--input" argument defaults to "stdin" so it can be skipped if stdin input is supplied.
cat sample.csv | node ./index.js -k BYTEPLANT_KEY
Through your preferred method of establishing an environment variable, ensure that BYTEPLANT_KEY
exists and has a valid API key.
-
One option (for Mac or Linux), is to run the following command in a terminal:
export BYTEPLANT_KEY=<your key>
Note: This will persist your API key into your terminal's
history
, which may be undesirable.
There are a variety of ways to establish an environment variable depending on your OS, needs, and preferences, but I will leave deciding upon a method as an exercise for the reader.
Execute the command yarn test
in a terminal.
The output will resemble:
yarn run v1.22.10
$ mocha ./*.spec.js **/*.spec.js
CLI
✔ should display help (141ms)
✔ should display error if no API key is supplied (124ms)
✔ should display error if supplied --api-key-env-var is undefined (122ms)
✔ should display error if attempt to use "--input csv" without specifying "--file" (128ms)
✔ should display error if attempt to use "--input stdin" while also specifying "--file" (128ms)
✔ should accept file input if supplied with valid arguments (738ms)
✔ should accept std input if supplied with valid arguments (122ms)
Byteplant
✔ should return a corrected address, when possible
✔ should return "Invalid Address" when address cannot be located
✔ should fail to look up an address without an API key (288ms)
✔ should return an address when used with valid API key and valid inputs (262ms)
✔ should return "Invalid Address" when address cannot be located (251ms)
✔ should wait to retry if rate limit is exceeded (102ms)
Provider
✔ should throw an error if abstract class is instantiated
✔ should throw an error if abstract method is not implemented in subclass
15 passing (2s)
✨ Done in 2.84s.
If desired, the tests can be "watched" (continuously re-run when files are saved).
Execute the command yarn watch
in a terminal.
The output will resemble:
yarn run v1.22.10
$ mocha ./*.spec.js **/*.spec.js --watch
CLI
✔ should display help (100ms)
✔ should display error if no API key is supplied (107ms)
✔ should display error if supplied --api-key-env-var is undefined (103ms)
✔ should display error if attempt to use "--input csv" without specifying "--file" (106ms)
✔ should display error if attempt to use "--input stdin" while also specifying "--file" (102ms)
✔ should accept file input if supplied with valid arguments (651ms)
✔ should accept std input if supplied with valid arguments (118ms)
Byteplant
✔ should return a corrected address, when possible
✔ should return "Invalid Address" when address cannot be located
✔ should fail to look up an address without an API key (296ms)
✔ should return an address when used with valid API key and valid inputs (264ms)
✔ should return "Invalid Address" when address cannot be located (263ms)
✔ should wait to retry if rate limit is exceeded (103ms)
Provider
✔ should throw an error if abstract class is instantiated
✔ should throw an error if abstract method is not implemented in subclass
15 passing (2s)
ℹ [mocha] waiting for changes...
To obtain a test coverage report, execute the command yarn coverage
in a terminal.
The output will resemble:
yarn run v1.22.10
$ nyc mocha ./*.spec.js **/*.spec.js
CLI
✔ should display help (216ms)
✔ should display error if no API key is supplied (207ms)
✔ should display error if supplied --api-key-env-var is undefined (224ms)
✔ should display error if attempt to use "--input csv" without specifying "--file" (215ms)
✔ should display error if attempt to use "--input stdin" while also specifying "--file" (219ms)
✔ should accept file input if supplied with valid arguments (790ms)
✔ should accept std input if supplied with valid arguments (225ms)
Byteplant
✔ should return a corrected address, when possible
✔ should return "Invalid Address" when address cannot be located
✔ should fail to look up an address without an API key (297ms)
✔ should return an address when used with valid API key and valid inputs (244ms)
✔ should return "Invalid Address" when address cannot be located (247ms)
✔ should wait to retry if rate limit is exceeded (104ms)
Provider
✔ should throw an error if abstract class is instantiated
✔ should throw an error if abstract method is not implemented in subclass
15 passing (3s)
--------------------------------|---------|----------|---------|---------|-------------------
File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s
--------------------------------|---------|----------|---------|---------|-------------------
All files | 98.87 | 95.12 | 100 | 98.83 |
et-address-validation | 100 | 100 | 100 | 100 |
index.js | 100 | 100 | 100 | 100 |
index.spec.js | 100 | 100 | 100 | 100 |
et-address-validation/provider | 98.3 | 91.66 | 100 | 98.23 |
byteplant.js | 100 | 100 | 100 | 100 |
byteplant.spec.js | 100 | 100 | 100 | 100 |
provider.js | 94.73 | 84.61 | 100 | 94.59 | 69,81
provider.spec.js | 100 | 100 | 100 | 100 |
--------------------------------|---------|----------|---------|---------|-------------------
✨ Done in 4.18s.
As a toy problem, it was rather quick and easy to write some functional code returning the expected output in a single index.js
.
Once I had a very basic example working, I gave some thought to finding some plausible use cases. Either as to how the program might be tested/evaluated or how it could be expanded upon in the future.
I created and structured the Provider
class as an Abstract class. As part of this effort, there are some expected methods that implementations of Provider
must implement. Failure to do so will result in errors.
Using this approach, shared methods can be written once in the Provider
class rather than needing to be rewritten for each new API provider. Since I have not attempted to add a 2nd provider at this time, I will not be surprised if limited refactoring is necessary to be successful at adding a second provider, but it should be minimal.
It occurred to me that it might be useful to show the ability to handle "large" CSV inputs. I doubt that a file large enough to trigger memory issues would be supplied, so this is more about showcasing ability/thought process than providing real utility.
The free Byteplant API key has modest rate limits that realistically prohibit more than 100 lines every 5 minutes, but there is no reason to assume the solution won't be tested against a larger input CSV file and a paid API key with different limits.
In any event, I decided to use streams for the input and to process the file synchronously line-by-line. It's a tradeoff, but this approach also allows me to easily handle the next use case regarding API limits.
In furtherance of supporting large files, I decided to leverage the information returned by the Byteplant API to ensure that my code would wait when its API usage was exhausted.
I did not want to assume that other possible API providers in the future would provide their rate limit waiting periods in the API responses, so there is also a recursive, expontential backoff functionality as a fallback time delay.
I wanted to make sure the program's CLI itself was easy to interact with. Luckily, yargs
is available where the logic of CLI argument parsing is done on my behalf, leaving me free to concentrate on other things.
It seems reasonable to me that a CLI utility might implement caching so that multiple attempts to verify the same address result in only a single API call. I have not added this logic, but I thought it was worth mentioning.
Throughout all of the above, I established *.spec.js
files for any new *.js
file.
In tandem with, or prior to, adding/changing/deleting code I would also update the tests.
Frequently, but not always, this took the form of TDD where I would fail a new test until I wrote code to handle that use case. I typically keep yarn watch
running at all times so I can spot mis-steps or mistakes quickly.
Towards the end of the process, I began to rely more upon yarn coverage
to spot any uncovered code. I don't have 100% coverage, but it's close. In the current state, I feel like I've demonstrated the ability to write tests and don't necessarily want to spend additional free time on the assignment.