This project contains the source code for using the Web Data Connector (3.0) framework to query Tableau's Metadata API. The Metadata API allows users to write GraphQL queries for assets that live on a Tableau Cloud site. Using this connector, you can now use these queries as a data source for creating visualizations in Tableau Desktop.
In order to build this web data connector, you will need to install the dependencies required by Tableau.
-
Metadata API - There are some Tableau specific settings, which can be found here. Basically, you need to ensure the Metadata API is enabled AND that your Tableau user has permissions to query it.
-
Taco Toolkit - You will also need to install the Taco Toolkit, following the instructions here. This provides the tooling to build & package your connector. You can verify this is working by typing
taco --version
into your command window. It should print the current version of the Taco Toolkit.
Clone this repository and open a command window at the root of the project. Enter the following commands, one at a time
# This will clean out your workspace (just in case there was some extra stuff in there)
taco clean
# Build the connector (installs dependencies, compiles code)
taco build
# Packages the connector in a format Tableau will recognize
taco pack
# Test the connector using Tableau Desktop
taco run Desktop
The last command should open up Tableau Desktop, with the connector installed. Now it's time to test it out.
Open Tableau Desktop and select the TableauCloud_MetadataAPI connector from the list of possible data sources. You will be prompted for some information about your Tableau Cloud environment.
You will need 4 things in order to query for data:
- Tableau Cloud Site URL - This is the URL to your Tableau Cloud site. It should have a format like this:
https://us-east-1.online.tableau.com/#/site/devplatembed/home
. We use this to derive the podus-east-1
as well as the site namedevplatembed
. - PAT Name - The name of your personal access token
- PAT Value - The value of your personal access token
- Query - the query to send to Tableau's Metadata API. There is an API reference, which lists the possible things to query for (as well as what can be derived at various levels). If you've never worked with the Metadata API before, this is a good starting point. Test your query using the GraphiQL query tool to verify the data it returns, then copy/paste into this connector.
Once you've input your credentials, click the Get Data button to execute the query. You should get taken to the Data Sources page, which will show a table for the object your queried for.
Tableau's Metadata API returns query results in a JSON object with the following structure, where is what you queried for:
{
"data": {
"<object-name>": [
...
]
}
}
So if you had the following query, to get the list of databases on a site:
{
databases {
id
name
connectionType
isEmbedded
}
}
The resultset would look like this:
{
"data": {
"databases": [
{
"id": "<some-id>",
"name": "<database-or-file-name>",
"connectionType": "",
"isEmbedded": true/false
},
...
]
}
}
This is pretty simple, as we have a single row per database/file. But what about when we ask for related (nested) things?
{
databases {
id
name
connectionType
isEmbedded
isCertified
downstreamDashboards {
id
name
updatedAt
workbook {
name
}
}
}
}
The above query asks for all databases, and for each one list the dashboards that reference it. This can produce a resultset like this:
{
"data": {
"databases": [
{
"id": "1",
"name": "somefile.csv",
"connectionType": "textscan",
"isEmbedded": true,
"isCertified": false,
"downstreamDashboards": []
},
{
"id": "2",
"name": "someExtract.hyper",
"connectionType": "hyper",
"isEmbedded": true,
"isCertified": false,
"downstreamDashboards": [
{
"id": "dashboard-guid",
"name": "some dashboard name",
"updatedAt": "2023-02-27T16:24:27Z",
"workbook": {
"name": "some workbook name"
}
},
{
"id": "another-dashboard-guid",
"name": "another dashboard name",
"updatedAt": "2023-02-17T12:04:11Z",
"workbook": {
"name": "some workbook name"
}
}
]
},
...
]
}
}
So in the above example, there are 2 databases. The first database has no downstream dashboards and the 2nd has 2 dashboards, making the JSON structure different.
This connector will flatten the JSON into a dataframe, and expand it based on the objects in nested arrays. So for the above example, we should get a table that looks like this:
id | name | connectionType | isEmbedded | isCertified | downstreamDashboards.id | downstreamDashboards.name | downstreamDashboards.updatedAt | downstreamDashboards.workbook.name |
---|---|---|---|---|---|---|---|---|
1 | somefile.csv | textscan | true | false | ||||
2 | someExtract.hyper | hyper | true | false | dashboard-guid | some dashboard name | 2023-02-27T16:24:27Z | myworkbook |
2 | someExtract.hyper | hyper | true | false | another-dashboard-guid | another dashboard name | 2023-02-17T12:04:11Z | some workbook name |
- This connector was built to work with Tableau Cloud, not Tableau Server. The reason, is that the WDC framework requires the developer to allowList the URL endpoints it will hit. Since your Tableau Server could have any hostname (tableau.company.com, analytics.company.com, etc) there is not secure way to allowList all possibilies. If you'd like to use this connector against Tableau Server, open up connector.json and look for the
permissions.api
object. Change out https://.online.tableau.com* with the hostname of your Tableau Server environment. You will have to runtaco build
andtaco pack
after making this (or any) code change. - "Error: Extract Not Created" - This error occurs on the earliest release of Tableau version 2023.1. Upgrading to a newer version will resolve the issue.