/truss-interview

Truss software engineering interview

Primary LanguagePython

This is one of the steps in the Truss interview process. If you've stumbled upon this repository and are interested in a career with Truss, check out our jobs page.

Truss Software Engineering Interview

Introduction and expectations

Hi there! Please complete the problem described below to the best of your ability, using the tools you're most comfortable with. Assume you're sending your submission in for code review from peers; we'll be talking about your submission in your interview in that context.

We expect this to take less than 4 hours of actual coding time. Please submit a working but incomplete solution instead of spending more time on it. We're also aware that getting after-hours coding time can be challenging; we'd like a submission within a week and if you need more time please let us know.

If you have any questions, please contact hiring@truss.works; we're happy to help if you're not sure what we're asking for or if you have questions.

How to submit your response

Please send hiring@truss.works a link to a public git repository (Github is fine) that contains your code and a README.md that tells us how to build and run it. Your code will be run on either macOS 10.13 or Ubuntu 16.04 LTS, your choice.

The problem: CSV normalization

Please write a tool that reads a CSV formatted file on stdin and emits a normalized CSV formatted file on stdout. For example, if your program was named normalizer we would test your code on the command line like this:

./normalizer < sample.csv > output.csv

Normalized, in this case, means:

  • The entire CSV is in the UTF-8 character set.
  • The Timestamp column should be formatted in ISO-8601 format.
  • The Timestamp column should be assumed to be in US/Pacific time; please convert it to US/Eastern.
  • All ZIP codes should be formatted as 5 digits. If there are less than 5 digits, assume 0 as the prefix.
  • The FullName column should be converted to uppercase. There will be non-English names.
  • The Address column should be passed through as is, except for Unicode validation. Please note there are commas in the Address field; your CSV parsing will need to take that into account. Commas will only be present inside a quoted string.
  • The FooDuration and BarDuration columns are in HH:MM:SS.MS format (where MS is milliseconds); please convert them to the total number of seconds expressed in floating point format. You should round the result to the nearest thousandth of a second.
  • The TotalDuration column is filled with garbage data. For each row, please replace the value of TotalDuration with the sum of FooDuration and BarDuration.
  • The Notes column is free form text input by end-users; please do not perform any transformations on this column. If there are invalid UTF-8 characters, please replace them with the Unicode Replacement Character.

You can assume that the input document is in UTF-8 and that any times that are missing timezone information are in US/Pacific. If a character is invalid, please replace it with the Unicode Replacement Character. If that replacement makes data invalid (for example, because it turns a date field into something unparseable), print a warning to stderr and drop the row from your output.

You can assume that the sample data we provide will contain all date and time format variants you will need to handle.