tly1980/cv

##About Tom Tang

###Using python:

Implemented a pixel server tracks 40+ millions pageviews/day across all Newscorp digital properties (news.com.au, dailytelegraph, taste.com.au, ... etc.).

Implemented Audience API supports enables newscrop to target customer on second pageview.

Pixel server + Audience API is a core part of our user-data / event realtime pipeline is the foundation of targeting advertisement business and user customization.
Implemented S3 clean up scripts using multi-process. Early version of awscli hangs when you have too many files in S3 bucket.

Using GO:

Maintain and enhance the user id Graph modeling tool to group all the related ID mappings.
Implemented a dynamodb ingestion tool ships segment attributes at 900 TPS with a ec2 t2.small instance.
Implemented a dynamodb table clone tool tops at 3500 TPS on a single EC2 m3.medium instance (tested with table with 50 millions records ), whereares the AWS offical data pipeline (beta version) failed to clone the tables in a EMR cluster.

Using Spark / BigQuery:

Generate a couple daily reports that tracks accruacy of 3rd party data partner.
Adlog reporting, including identifing DFP feq-cap issues and internal identifier issues.

BTW:

Probably the first one to use AWS Kinesis for project, as he found a blocking bug in boto (AWS official python sdk) and fix it and created pull-request to boto on github, and it was accepted.
They all run on AWS ec2 and heavly use AWS service ( SQS / Kinesis / DynamoDB / S3 / EC2-ELB / EC2-Autoscalegroup / SNS / RDS ) and some of the project run on docker container on top of EC2
What is mentioned above mostly implemented by himself.

Github projects:

domain	Projects	Comments
devops	agile_conf	CFN generation using Ninja2 templates
devops	provisioner	Server continus provisioning Using S3 / Gsutils
tools	dynoclone	DynamoDB clone tool
tools	senv	Secure your enviornment variables using Mac keychain

Here is my other github projects or my slides.