Pinned Repositories
ArchitectingWithGCP_Fundamentals_Course1_CoreInfrastructure
Virtual Machines in Cloud Use Cloud Launcher to deploy a LAMP stack on a Compute Engine Instance. Create a Compute Engine VM using GCP console. Create a Compute Engine VM using the GCloud command-line-interface. Connect between the two instances. Storage in Cloud Create a Cloud Storage bucket and place an image in it. Configure an application running in Compute Engine to use a database managed by Cloud SQL. Configure a web server with PHP. Use the image in the Cloud Storage bucket on a web page. Containers in Cloud Create a Kubernetes Engine cluster containing several containers, each containing a web server. Deploy and manage Docker containers using kubect1. Place a load balancer in front of the cluster and view its contents. Applications in Cloud Preview an App Engine application using Cloud Shell. Launch and disable an App Engine application. Deployment in Cloud Create a deployment using Deployment Manager and maintain a consistent state of deployment. Update a deployment. View the load (resource usage) on a VM instance using Google Stackdriver. Big Data & Machine Learning in Cloud Load data from Cloud Storage into BigQuery. Perform a query on the data in BigQuery both in the console and in the shell (using BQ command).
ArchitectingWithGCP_Fundamentals_Course2_EssentialCloudInfrastructureFoundation
Console and Cloud Shell. Access Google Cloud Platform. Create a Cloud Storage Bucket using the GCP console and Cloud Shell. Understand shell features. Infrastructure Preview. Use Cloud Launcher to build a Jenkins Continuous integration environment. Manage the service from the Jenkins UI. Administer the service from the Virtual Machine host through SSH. Virtual Networking. Understand network layout and placing instances in various locations and establish communications between virtual machines. Create an auto-mode network, a custom-mode network, and associated subnetworks. Compare connectivity in the various types of networks. Create routes and firewall rules using IP addresses and tags to enable connectivity. Convert an auto-mode network to a custom-mode network. Create, expand, and delete subnetworks. Bastion Host. Create an application web server to represent a service provided to an internal corporate audience. Prevent the web server from access to or from the internet. Create a maintenance server, called a Bastion Host, to gain access to and verify internal connectivity to the application server. Virtual Machines. Create a utility virtual machine for administration purposes, a standard VM and a custom VM. Launch both Windows and Linux VMs and deleted VMs. Working with Virtual Machines. Create a customized virtual machine instance, using an n1-standard-1 machine type that includes a 10 GB boot disk, 1 virtual CPU (vCPU), and 3.75 GB of RAM. Machine type runs Debian Linux by default. Install base software (a headless JRE) and an application software (a Minecraft game server). Attach a high-performance 50-GB persistent solid-state drive (SSD) to the instance. Minecraft server can support up to 50 players. Reserve a static external IP so the address would remain consistent. Verified availability of the gaming server online. Set up a backup system to back up the server’s data to a Cloud Storage bucket and test the backup system. Automate backups using cron. Set up maintenance scripts using metadata for graceful startup and shutdown of the server.
ArchitectingWithGCP_Fundamentals_Course3_EssentialCloudInfrastructureCoreServices
Cloud Identity and Access Management (IAM). Use Cloud IAM to implement access control. Grant and revoke IAM roles, first to a user, Username2. Restrict access to specific features or resources. Allocate Service Account User credentials and “bake” them into a virtual machine to create specific-purpose authorized bastion hosts. Cloud Storage. Create and use buckets. Learn about the following features: customer-supplied encryption key (CSEK), use your own encryption keys, rotate keys, access control list (ACL), set an ACL for private, and modify public. Lifecycle management, set policy to delete objects after 31 days, versioning, create a version and restore a previous version, directory synchronization, recursively synchronize a VM directory with a bucket, cross-project resource sharing using IAM, use IAM to enable access to resources across projects. Cloud SQL. Create a Cloud SQL instance and a client VM instance to serve as a database client. Install software. Restrict access to the Cloud SQL instance to a single IP address. Download sample GCP billing data in *.csv format and load that into the database. Improve security by requiring SSL certificates--configure the Cloud SQL instance and the client to use SSL encryption. Cloud Datastore. Initialize Cloud Datastore. Create content (populate with data entities) in the Datastore database. Query the content running both “Query by kind” and “Query by GQL” queries. Access the Cloud Datastore Admin console. Enabled the Cloud Datastore Admin console to clean up and remove test data. Examining Billing Data with BigQuery. Sign in to BigQuery from the GCP console. Import billing data into BigQuery that had been generated as a CSV file. Create a dataset. Create a table. Run a simple query on the file. Access a shared dataset containing more than 22,000 records of billing information. Run queries on the data to explore how to use BigQuery to ask and answer questions. Resource Monitoring (Stackdriver). Create a Stackdriver account. Enable Stackdriver Monitoring to monitor projects. Add charts to dashboards. Create alerts with multiple conditions. Create resource groups. Create uptime checks for services. Error Reporting and Debugging (Stackdriver). Launch a Google App Engine application. Introduce a code bug to break the application. Explore Stackdriver Error Reporting to identify the issue, and then analyze the problem, finding the root cause using Stackdriver Debugger. Modified the code to fix the problem. Monitor the change by examining the results through Stackdriver.
ArchitectingWithGCP_Fundamentals_Course4_EssentialCloudInfrastructure_Scaling-Automation
Virtual Private Network (VPN). Created 2 custom networks and associated subnetwork in different regions. Created VPN gateways in each network. Established static routes to enable the gateways to pass traffic. Configured static routes to pass traffic to the VPN gateway. Established firewall rules to enable ICMP and SSH traffic. Performed most of the configuration from the command line. By configuring VPN manually, attained a better understanding of what GCP console does automatically so as to better troubleshoot a configuration. Virtual Machine Automation and Load Balancing. Created a pool of VMs, web servers, and directed traffic to them through an external network. Configured an external load balancer to use the pool, distributing work among the servers. Tested for high availability by placing a load on the service and stop a VM to simulate an outage by putting a bug in the code. Launched 2 more VMs in a secondary zone. Configured an internal load balancer. Tested the internal load balancer for work distribution and availability. Autoscaler. Created a VM, then customized it by installing software and changing a configuration setting (making Apache start on boot). Used the custom image in an instance template, and then used the image template to make a managed instance group. After all the backend and frontend parts were connected together, stress-tested the system and triggered autoscaling using Apache Bench. Goal was to set up an HTTP(S) load balancer with autoscaling and verified that it was working. Infrastructure Automation. Created an IAM service account. Create a VM. Authorized a VM to use Google Cloud API, using the service account for purpose of creating automation tools. Installed open-source software on the VM. Configured and tested the VM by deploying a Hadoop cluster. Created a global solution by generating a snapshot of the boot disk with the service account already “baked in”. Recreated the Clustermaker VM in a different region and tested it by deploying another Hadoop cluster in the new region. Learned how to use IaaS skills that can be leveraged to automate activities through the Google Cloud SDK. This is important for Site Reliability Engineering (SRE).
ArchitectingWithGCP_Fundamentals_Course6_ReliableCloudInfrastructure_DesignandProcess
Beginning AppServer. Created a Cloud Deployment Manager template for Appserver. Created a Deployment Manager template in YAML format. Learned to work with YAML. Related JSON to YAML and corrected syntax errors in YAML. Created a prototype template from the documentation by converting the reference to YAML. Pruned the prototype template to common and required properties. Used Gcloud commands to interrogate the GCP environment to find the exact values and URIs required to configure the template. Worked with Deployment Manager to create multiple environments for different organizations and purposes and then de-deployed them after they have served their purpose. Package and Deploy. Overview—Using Deployment Manager templates, including JINJA2 templates, created a virtual machine that loads a python application and dependencies and boots up and configures itself to run a service. Specifically, deployed a service using a pre-written Python application called “Echo” and using example Deployment Manager templates written in YAML and JINJA2. Created a deployment package suitable for Deployment Manager using the python package manager, PIP. Staged package in a Cloud Storage bucket. Manually tested the application to ensure that it was working properly. Tested the new service. Adding Load Balancing. Used a pre-written Python application called “Echo” and existing Deployment Manager templates written in JINJA2. Created a Deployment package suitable for Deployment Manager using the python package manager, PIP. Staged the package in a Cloud Storage bucket. Followed best practices and manually tested the application to ensure that it was working properly. Investigated and gathered information necessary to configure health checks. Used Deployment Manager to Deploy the Echo Load Balancer (LB) service. Tested the new service. Enabled and verified that health check was functioning. Deploy a Full Production Application with Stackdriver (monitoring). Cloned a public repository of deployment management templates. Launched a cloud service from a collection of templates. Configured basic black box monitoring of a logbook application. Enabled Stackdriver to configure monitoring, alert notifications, and to set up graphical dashboards, showing CPU usage and received packets with dynamically updating charts on the dashboard. Created an uptime check to recognize a loss of service. Established an alerting policy to trigger incident response procedures. Used Apache Bench to generate load traffic to test the system and to trigger auto scaling. Simulated a service outage to test notifications and resiliency features. Verified receipt of email notification of failure.
DataEngineering_GCP_course3_ServerlessDataAnalysis_GoogleBigQuery-CloudDataflow
BigQuery. BigQuery is a petabyte scale data warehouse on Google Cloud that can run queries. Create a query, modify a query to add clauses, subqueries, built-in function and joins. Load a CSV file into a BigQuery table using the web UI. Load a JSON file into a BigQuery table using the CLI. Export a table using the web UI. Use nested fields, regular expressions, WITH statement, and GROUP, and HAVING. Dataflow. Dataflow is a runner (execution framework). Each step is called a transform. It goes from source (BigQuery) to sink (Cloud Storage). Setup a python dataflow project using Apache Beam, which executes data processing workflows. Create a Dataflow pipeline, using filtering. Execute query locally and on the cloud. MapReduce. To process a large dataset, break up the dataset into pieces such that each compute node processes data that’s local to it. The map operations happen in parallel on chunks of the original input data. The results of these maps are sent to the reduce nodes where aggregates are calculated. Reduce node processes on key or one set of keys. Identify map and reduce operations. Execute the pipeline. Use command line parameters. Side Inputs. A side input is an additional input that your DoFn can access each time it processes an element in the input PCollection. When you specify a side input, you create a view of some other data that can be read from within the ParDo transform's DoFn while procesing each element. Load data into BigQuery and run complex queries. Execute a Dataflow pipeline that can carry out map and reduce operations, using side inputs and stream into BigQuery. Use the output of a pipeline as a side-input to another pipeline.
DataEngineering_GCP_course4_Serverless-Machine-Learning-with-TensorFlow
Explore and create ML datasets. Sample the dataset and create training, validation, and testing datasets for local development of TensorFlow models. Create a benchmark to evaluate the performance of ML. TensorFlow is used for numerical computations, using directed graphs. Getting started with TensorFlow. Explore the TensorFlow python API, build a graph, run a graph, feed values into a graph. Find areas of a triangle using TensorFlow. Learning from tf.estimator. Read from python’s pandas dataframe into tf.constant, create feature columns for estimator, perform linear regression with tf.Estimator framework. Execute Deep Neural Network regression. Use benchmark dataset. Refactoring to add batching and feature creation. Refactor the input. Refactor the way the features are created. Create and train the model, Evaluate the model. Distributed training and monitoring. Create features out of input data. Train and evaluate. Monitor with Tensorboard. To run TensorFlow at scale, use Cloud ML Engine. Package up the code. Find absolute paths to data. Run the python module from the command line. Run locally using GCloud. Submit training job using GCloud. Deploy model. Make predictions. Train on a 1-million row dataset. Feature Engineering. Working with feature columns. Adding feature crosses in TensorFlow. Reading data from BigQuery. Creating datasets using Dataflow. Using a wide-and-deep model.
LinearAlgebra_python
Coding the matrix, linear algebra, python implementation
MachineLearning_TensorFlow_GoogleCloudPlatform_course3_IntroToTensorFlow
Writing low-level TensorFlow programs. Learned how TensorFlow Python API works by building a graph, running a graph, and feeding values into a graph. Calculated area of a triangle using TensorFlow. Implementing a Machine Learning model in TensorFlow using Estimator API. Implemented a simple machine learning model using tf.learn. Read csv data into a Pandas dataframe. Implemented a linear regression model in TensorFlow. Trained and evaluated the model. Predicted with the model. Repeated with a Deep Neural Network (DNN) model in TensorFlow. Scaling up TensorFlow ingest using batching. Loaded large dataset progressively using tf.data.Dataset. Broke the one-to-one relationship between inputs and features. Creating a distributed training TensorFlow model with Estimator API. Learned the importance of watching your validation metrics while training is in progress. Used the estimator.train_and_evaluate function. Monitored training using TensorBoard. Scaling TensorFlow with Cloud Machine Learning Engine. Packaged up TensorFlow model. Ran training locally. Ran training on cloud. Deployed model to cloud. Invoked model to carry out predictions.
StatisticalLearning_classstuff
Advanced Statistics, R code
kjy's Repositories
kjy/StatisticalLearning_classstuff
Advanced Statistics, R code
kjy/DataEngineering_GCP_course3_ServerlessDataAnalysis_GoogleBigQuery-CloudDataflow
BigQuery. BigQuery is a petabyte scale data warehouse on Google Cloud that can run queries. Create a query, modify a query to add clauses, subqueries, built-in function and joins. Load a CSV file into a BigQuery table using the web UI. Load a JSON file into a BigQuery table using the CLI. Export a table using the web UI. Use nested fields, regular expressions, WITH statement, and GROUP, and HAVING. Dataflow. Dataflow is a runner (execution framework). Each step is called a transform. It goes from source (BigQuery) to sink (Cloud Storage). Setup a python dataflow project using Apache Beam, which executes data processing workflows. Create a Dataflow pipeline, using filtering. Execute query locally and on the cloud. MapReduce. To process a large dataset, break up the dataset into pieces such that each compute node processes data that’s local to it. The map operations happen in parallel on chunks of the original input data. The results of these maps are sent to the reduce nodes where aggregates are calculated. Reduce node processes on key or one set of keys. Identify map and reduce operations. Execute the pipeline. Use command line parameters. Side Inputs. A side input is an additional input that your DoFn can access each time it processes an element in the input PCollection. When you specify a side input, you create a view of some other data that can be read from within the ParDo transform's DoFn while procesing each element. Load data into BigQuery and run complex queries. Execute a Dataflow pipeline that can carry out map and reduce operations, using side inputs and stream into BigQuery. Use the output of a pipeline as a side-input to another pipeline.
kjy/project_homelessness
GlobalHack VI project on homelessness
kjy/BigData_ApacheSpark_110X
kjy/course1
kjy/course2
kjy/course3
kjy/course4
kjy/DataEngineering_GCP_course1_BigData-MLFundamentals
Google Cloud Platform (GCP), architectures, cloud SQL, cloud Storage, Compute Engine, ML APIs, ML with BigQuery, ML with TensorFlow, DataProc with PySpark, data engineering learning path
kjy/DataEngineering_GCP_course2_LeveragingUnstructuredData_Dataproc
Google Cloud Platform, Dataproc as cloud-based implementation of Hadoop, HIVE, Pig, PySpark, ML, NLP, Sentiment Analysis, Cluster automation, CLI commands
kjy/deep-learning
Repo for the Deep Learning Nanodegree Foundations program.
kjy/DeepLearning
Python Keras Theano TensorFlow
kjy/differentiable_neural_computer_LIVE
kjy/Distributed_ML_ApacheSpark_120X
kjy/Generative_Adversarial_networks_LIVE
kjy/imgproc_pytalk
Image Processing in Python talk
kjy/Intro_ApacheSpark_105X
kjy/pandas_ipython
Anaconda's ipython distribution, numpy, pandas, matplotlib
kjy/python-driver
DataStax Python Driver for Apache Cassandra
kjy/python_DataScience
pandas, numpy, matplotlib
kjy/react-workshop
A step-by-step workshop for learning React fundamentals.
kjy/RepData_PeerAssessment1
reproducible research, Rmd, md, html, figures
kjy/Spark_AWS
kjy/Statistics1
Statistics, R code
kjy/UdacityProject_1_PredictBikeSharingUsage
Deep Learning course
kjy/UdacityProject_2_ClassifyObjectsFromImages
Udacity Deep Learning course
kjy/UdacityProject_3_GenerateTVscripts
Udacity Deep Learning course
kjy/UdacityProject_4_EnglishFrenchTranslationChatbot
kjy/UdacityProject_5_GAN_FaceGeneration
Udacity Deep Learning course
kjy/webinars
Code and slides for RStudio webinars