Phuong Tran

GCP Certified Professional Data Engineer with an exceptional skill set in designing and implementing Google Cloud native solutions. Demonstrated track record of success in leading and executing data analysis and data science projects

Email / Website / LinkedIn / GitHub

👩🏼‍💻 Engineering Experience

Data Engineer @ Mindvalley (Nov 2022 - Present)

  • Technologies used: Cloud Composer, Pub/Sub, Dataflow, Bigquery, Cloud Build.
  • Designed and executed a webhook infrastructure to facilitate seamless communication with external vendors.
  • Designed and implemented declarative pipelines utilizing custom operators.
  • Efficiently migrated and upgraded Composer 1.0 to 2.0, ensuring a smooth transition and optimal performance.
  • Architected the asset played time calculation logic and implemented a streaming pipeline to efficiently collect relevant data.

Data Engineer @ Persuasion Technologies (Sept 2021 - Oct 2022)

  • Technologies used: Cloud Run, Cloud Function, Cloud Build, Bigquery, dbt, Lookerstudio.
  • Owned production‑level ETL/ELT pipelines from various data sources including on‑premise servers and API calls.
  • Led the improvement of data observability for existing and new pipelines with dbt and re‑data.
  • Worked with clients to understand business needs and translate those needs to actionable reports in LookerStudio.
  • Led the migration of legacy SQL data modellings to dbt.
  • Owned ML model‑serving application built on AppEngine and CloudRun to predict disease risk from 70k+ medical records.

Data Analyst @ Deakin University (Aug 2018 - Aug 2021)

  • Technologies used: AWS Glue, Amazon Redshift, AWS Lambda.
  • Designed, planed and conducted multifactorial experiments to generate large genetic datasets.
  • Conducted multivariate statistical analysis, negative binomial modelling (egdeR) and built data‑specific custom model in R.
  • Implemented BASH/Snakemake pipelines to process and analyse data.
  • Worked with Laboratory Director, Finance and Suppliers to ensure timely, cost‑effective and ongoing supply of equipment and consumables for experimental works.
  • First Place and People’s Choice Award in Three Minute Thesis competition (50+ participants) for creative, entertaining and effective science communication.

🌀 Freelancing Experience

Fashion Catalog AI: LLM-Powered Description Generation (Dec 2023 - Present)

  • Technologies used: Streamlit, Cloud Run, LLM.
  • Implemented GPT-4 Vision model integration for generating product descriptions from images and specified features, showcasing the results through a user-friendly and lightweight UI on Streamlit.

Real-Time Data Streaming and Analytics Dashboard (Sept 2023 - Oct 2023)

  • Technologies used: Cloud Function, Lookerstudio, Firestore.
  • Developed a streaming pipeline to transfer virus test data in real-time from Firestore to BigQuery.
  • Created a reporting dashboard for enhanced analytics and decision-making.

dbt Integration with Airflow: Python Virtual Env and CI/CD Setup (Mar 2023 - Jul 2023)

  • Technologies used: Docker, dbt, Airflow, Cloud Composer, Cloud Build.
  • Integrated dbt with Airflow, optimizing deployment through Docker and tailoring Python virtual environments for enhanced customization, all while establishing a robust CI/CD pipeline for agile development workflows.

📌 Certifications

Google Cloud Professional Cloud Database Engineer (2023)
Google Cloud Professional Certifications

Google Cloud Professional Data Engineer (2022)
Google Cloud Professional Certifications

DAG Authoring Certification for Apache Airflow (2023)
Astronomer

HackerRank Certified SQL (Advanced) (2022)
Hackerrank

⚓ Skills

Programming Languages
Python, SQL, R

Cloud Computing
Google Cloud, AWS

Transformation Tools
dbt

Containerisation Platform
Docker

Reporting Tools
PowerBI, Lookerstudio, Streamlit

👩🏼‍🎓 Education

Bachelor of Biomedical Science focused on Bioinformatics
Monash University - Kuala Lumpur, Malaysia (2015 - 2017)