One World of Minerals project from USGS

1. Sign up for a Bluemix free trial account if needed.

Go to bluemix.net and click Create a free account
Fill out the registration form and click the Create Account button.
You'll receive a verification e-mail which contains a link to activate your account.

2. Login to Bluemix

Go to bluemix.net and click the Log in button
Provide your username and click the Continue button
Provide your password and click the Log in button

3. Create an IBM Data Connect Instance

Once logged into Bluemix, go to the Bluemix Catalog by clicking the Catalog button in the upper right corner or going to bluemix.net/catalog
In the catalog, type Data Connect into the search bar and click on the Data Connect service description.
Ensure that the Data Connect pricing plan is set to Starter and click the Create button.
On the service description page for your new Data Connect instance, click the Manage tab then click on the Launch button to launch the Data Connect dashboard.

4. Replicate data normalization work in IBM Data Connect

In the Data Connect dashboard, click on File Storage in the menu on the left side of the screen.
Upload the OneWorld Working Dataset file by clicking on the icon at the bottom of the screen then select your file from the explorer window and clicking Open.
Once your data file is uploaded, click on Refine & Copy
In the Refine & Copy screen, click on Files
In Files, click on the file you just uploaded
In Worksheets, check the box next to all worksheets you want to refine. As you select worksheets to refine, they will appear in the Selected column on the right side of the screen.
Click the triple dots to right of your worksheet in the Selected column then click on the Configure button to perform an initial shaping of your data.
The Configure window provides a preview of the data, and we notice that data starts at row 3 with data headings at row 2. To remedy this, we will click on Configure. Note: Data Connect automatically creates a unique ID for each data row.
Click OK in the resulting dialog. We haven't made any column changes yet so nothing will be lost.
In the next dialog, specify that data starts at row 2 by clicking on row 2, check Selected line is the header, then click Apply.
In the next dialog, review that your data headers are correct then click Done.
Click the Refine Data button in the upper right corner.
In the Refine Data view, click Organize then click Change Schema
Remove the columns named COLUMN2, COLUMN5, and COLUMN11 by mousing over the column name and clicking on the trash can icon.
When you are done removing all 3 columns, click on Apply.
Begin editing the name for your activity by mousing over the activity name at the top of the screen (will probably say Untitled) and clicking on the pencil icon to the right.
Provide a new name for the activity then click the checkmark button.
Click the Next button in the upper right corner of the screen.
In the next screen, click Files on the left side of the screen.
Select JSON as the type of file to create.
Click the Run button in the upper right corner of the screen.
In the resulting screen, you will see your activity running.
When the word Running... disappears from your activity, the activity has completed.
Click on File Storage in the menu on the left side of the screen.
Your new JSON file should be listed in File Storage. Download it by mousing over the file name and clicking on the download icon. This is the file you will load into Elasticsearch.

5. Create Elasticsearch Instance on Bluemix

From your IBM Bluemix Dashboard, scroll down until you see a blue button labeled "Create Service" towards the right side of the page and click it.
From the catalog page, you will have two options for creating your Elasticsearch service. You can either go to the "Container" section and upload your own Docker image, or you can go to the "Data & Analytics" section and choose "Compose for Elasticsearch".
From this creation page, you can "Connect To" your other IBM Bluemix Applications. Once you've chosen a Service and Credential name, click the "Create Button" in the lower right corner.

6. Stand up Kibana instance pointing to Elasticsearch

Kibana is an open source plugin for Elasticsearch that will allow us to visual our data. Note that at the current time, Kibana is not a default service available on Bluemix, so we will have to upload our own docker image for this application.

From the dashboard, click the "Create App" button.
Next click on "Containers" and choose "Upload an Image".
You'll need to follow the steps shown to push your docker image to Bluemix. If you don't have a docker image for Kibana, you can find one here: https://hub.docker.com/_/kibana/ Note: You may need to configure your Kibana image's environment variables to point towards your Elasticsearch instance. Information about editing docker files can be found here: https://docs.docker.com/engine/reference/builder/

7. Create an index in Elasticsearch

In our case, Elasticsearch will be automatically indexing our data for us on upload based on the "index name" we choose in our CURL request to post the data.

For advanced users who want to know more about creating their own indexes, please refer to the elastic documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html

8. Format json data & load into Elasticsearch

To format our data correctly, we can use a library called JQ, downloaded from here: https://stedolan.github.io/jq/

The from the directory where you have jq and your OneWorld_Working_Dataset_Sample.json, run:

./jq --slurp '{("minerals"): .[]}' OneWorld_Working_Dataset_Sample.json > OneWorld_Working_Dataset_Sample_Formatted.json

Optionally, if you needed to add a field, you could run this command (here we are adding a timestamp value to each entry):

./jq --slurp '{("minerals"): [.[] | .[] | .[] * {TIMESTAMP: "1489854901"}]}' OneWorld_Working_Dataset_Sample.json > OneWorld_Working_Dataset_Sample_Formatted.json

Loading into Elasticsearch: One easy method to load a json file into Elasticsearch is to POST the file via curl. For example, for uploading a json file in your current directory, use the command:

curl -u username:password -XPOST 'https://url_for_elastic_search:port/index_name' -d @OneWorld_Working_Dataset_Sample_Formatted.json --insecure

The --insecure command is only needed when running services with self-signed CA certificates.

9. Create a basic visualization in Kibana

Access the kibana dashboard by going to the public IP and port you assigned for it when creating the container. Example xxx.xx.xx.xxx:5601/app/kibana
Click "Visualize" on the top menu bar. We'll start with a simple bar chart for now, so choose "Vertical Bar Chart".
Select "From a New Search" for now. This will let us customize the search query used for data.
Notice the top search bar has a "*" in it. Leave that for now, since that will run our visualization across all the data in Elasticsearch. If you needed to refine the output with a specific query, you could do it there.
On the left menu, choose "Data". From here we can start designing our chart. For the Y-Axis, we can choose "Count" for Aggregation.
For the X-Axis, we can choose "Filters" for Aggregation. Now, we can add as many Filters as we would like. For now lets make two of them, one with COMMODITY: Salt and the other with COMMODITY: Copper.
Further down on the left side menu, we can add a "Sub-Bucket" and choose "Split Bars".
Choose "Terms" for the Sub Aggregation, "COUNTRY" for the Field, "metric: Percentage of Count" for Order By. "Descending" for Order, and "7" for Size.
Click the green play button towards the top of the left side menu and your metric should now be visualized, showing percentage of Salt and Copper by Country. Your visualization setup should look similar to the settings shown on the left hand menu below.
For more information about other visualizations in Kibana, please refer to their documentation here: https://www.elastic.co/guide/en/kibana/current/tutorial-visualizing.html

everwatchsolutions/OWOM

One World of Minerals project from USGS