/vscode-mysql

Coursera. Scripting with Python and SQL for Data Engineering: Week 4

Working with VSCode and MySQL

Working from a terminal using command-line tools is very quick and can feel practical when performing certain tasks. However, using a powerful text editor like MySQL is useful when visualizing data and persisting queries accross sessions, databases, and database servers.

This repository and its contents are closely related to the Scripting with Python and SQL for Data Engineering course on coursera (week 4). It will assume you are using the VSCode with MySQL container image which has defaults that these examples use, like host, port, and user credentials.

Connecting to MySQL

There are a few critical things to be aware when connecting to any database server. In the past weeks, you used an embedded database called SQLite. Unlike SQLite, database servers have distinct information required to connect:

  • Host
  • Port
  • Optional SSL certificate

Database servers can listen on a specific address or hostname, and use a particular port to do so. Most commonly, you can use localhost as the host when the database is on the same server as where you are connecting from. For example, if the database is running in your computer, you will probably use localhost.

This is, however, not always true. It is entirely possible to configure database servers to not listen to connections on localhost and only use a FQDN (Fully Qualified Domain Name). This is similar to using ports.

For this course, you will be greeted with a VSCode editor in the browser that has some pre-installed extensions that allow you to connect to MySQL which is running on the same host.

  1. Click on the File Explorer icon on the left sidebar
  2. Find the MYSQL section and click on it
  3. Click on the "+" button to create a new connection

Use the following information to complete the prompts:

  • Hostname of the database: localhost
  • MySQL user to authenticate: root
  • Press Enter and leave the password field blank
  • The port number to connect (prepopulated): 3306
  • Press Enter to leave the certificate file path empty

Running Queries

The default MySQL database server has several databases in it. Use the mysql database from the server list, and right click on it and select New Query. A new pane will show up and allow you to create a query.

Use the following query to try it out:

SELECT * from user;

Next, do not click on the Run button. Use the command palette from VSCode to find the MYSQL: Run MYSql query entry. You should get a new pane with results.

Now it is time to run other queries and create a real database with a useful dataset. In VSCode, on the left side bar, find and click the Source Control section to clone a new repository. When prompted use https://github.com/alfredodeza/vscode-mysql.git to clone this repo into the lab.

After cloning, open the sql_scripts directory and click on setup.sql. That SQL script creates the database and sets up the required table to populate the data. Use the same technique as before to execute the query with the command palette. If the output shows, it should be similar to:

[Start] Executing MySQL query...
[Done] Finished MySQL query.

With the table created, you can populate it with the rest of the data in the populate.sql. Use the same process as before to execute the query.

Verify that your data exists in the table and ratings database by running a SELECT statement:

SELECT * from ratings LIMIT 100;

This query is also available in the sql_script directory. Do some data exploration to check the data and get familar with the type of contents you are going to be working with.

Exporting to CSV

There are several ways to export data. You can use the power of SQL to create a new script and export the fields and data you need into a CSV file.

Find the export_csv.sql file in the sql_script directory and execute it against the database.

What happens if you run it more than once?

Try changing the script to create a CSV file with only the names and ratings instead of including the region.

Exporting as SQL

Different databases have their own ways of exporting data. In MySQL databases you can use the mysqldump utility to export data. Open up a terminal and run the following command:

$ mysqldump -u root -p ratings > vscode-mysql/export.sql

Remember that the coursera database doesn't have a password for the root user. In a real-life scenario, you would have to type a password and perhaps use a specific user instead of using root.

Inspect the export.sql file and check its contents. You can use this same file to load data elsewhere. This operation is useful if you are migrating to and from databases or to create a backup.

Note that the SQL file also has instructions on how to re-create the table and database itself, not only the actual data.