💡 Expertise:
In my journey as a Big Data Engineer, I have honed my skills in:
🔹 Big Data Technologies: I have a strong command over Hadoop, Spark and their ecosystems. I specialize in building scalable data pipelines, processing large datasets, and optimizing performance for efficient data processing.
🔹 Programming Languages: I am proficient in Python, SQL and Spark, using them to develop data-centric applications, perform data analysis, and build machine learning models.
🔹 Data Warehousing: I have hands-on experience with data warehousing principles, including data modeling, ETL (Extract, Transform, Load) processes, and dimensional modeling. I am well-versed in designing and implementing data warehouses for improved data accessibility and reporting.
🔹 Database Management: I have a strong grasp of SQL and have worked extensively with both relational databases (MySQL, PostgreSQL) and NoSQL databases (MongoDB, Cassandra). I excel at writing complex queries, optimizing database performance, and ensuring data integrity.
🔹 Cloud Platforms: I am adept at working with cloud-based environments, particularly on AWS and Azure.
🔹 Data Visualization: I possess a keen eye for visualizing data insights and effectively communicating complex findings to stakeholders. I am skilled in using tools like Tableau and Power BI to create intuitive dashboards and reports.
🧑💻 Programming Languages:
Python | SQL | Spark
⛓️ Distributed Framework:
Spark | Hadoop | Hive | Kafka | Sqoop
💾 Databases:
MySQL | MongoDB | Cassandra | HBase
🧬 Version Control:
Git | DVC
⏰ Workflow Management:
Airflow | Mage
☁️ AWS Services:
S3 | EC2 | EMR | RDS | Redshift | Glue | CloudWatch |
ECS
☁️ Azure Services:
Data Factory | Databricks | Functions | Blob | Synapse
| Delta Lake
🚀 MLOps:
Docker | Docker Compose | GitHub Actions | MLflow
🪄 ML Frameworks:
Pandas | Numpy | Sklearn | PySpark | Pytorch |
Matplotlib | Seaborn | TFX