This is a test deployment of a Koop.js server as part of the Code for Pittsburgh food access map project.
The high-level idea is that we have a bunch of food access location data in a CSV with latlongs, and we have an ArcGIS Online web app to display those data, but we can only use critical AGOL features such as filter widgets if we make our data available in a format akin to an AGOL hosted feature layer. Enter Koop.js: an open-source project from Esri that creates a lightweight web server that ingests geospatial data from a variety of formats and makes it available in a variety of formats, including Geoservices which should meet our needs here.
In this repository, I'm setting up a Koop server and deploying it on Heroku, whose free tier should be sufficient to let us keep the Koop server up 24/7. We will probably need to recreate some or all of this work in an official C4P repository, but this is a start and maybe we can just fork it into the C4P GitHub account.
That's a Young Frankenstein reference, for what it's worth.
Goodness knows I'm not as good at documentation as Max, but I figure I'll keep some notes on what I did to get this working, so we can recreate the process later and elsewhere.
Per the Koop quickstart guide, Koop requires Node.js and npm
(which is a part of Node.js). So:
- Install Node.js. I installed version 12.18.3 LTS but I imagine the 14.10.0 Current version would work fine too.
- Verify that Node.js installation was successful by opening a terminal and typing
node -v
and/ornpm -v
. This latter command printed6.14.6
for me. - Type
npm install -g @koopjs/cli
to install the Koop CLI.
- In the parent directory of where you want the app to live (say,
C:\Users\drew\Documents\GitHub
), runkoop new app koop-test
. This will create a directory calledkoop-test
and pre-populate a Git configuration, Node.js dependencies, etc. - I don't really understand how
koop new app
interacts with existing Git or GitHub repositories. I initially created a repo calledkoop-test
on GitHub and cloned the repo to my local machine, but runningkoop new app
overwrote the contents of this directory, or at least the files with name conflicts. - In the
koop-test
directory, you can runnpm start
to start the dev server (a quick server that is available locally for rapid development purposes). You can then visit http://localhost:8080/ to see the dev server in action. (Initially, all it does is display "Welcome to Koop!") You can also runkoop serve
and get the same result (I think). - We need the CSV provider, so within the
koop-test
directory, runkoop add provider koop-provider-csv
.
The basic idea of Koop is that it ingests data from any of several provider plugins, intermediately converts the data to GeoJSON, then exports data to any of several output plugins. Koop comes with the GeoServices output plugin already installed (because, as far as I can tell, making geospatial data available via the GeoServices API is the primary point of Koop) but we need to install the CSV provider plugin separately.
Helpful documentation here.
- Open
koop-test/config/default.json
in a text editor. - Define one or more CSV sources, following the format in the documentation.
- It might be helpful to assign an
idField
in the config file - Koop chirps at me that noidField
was set, but reassures me that it created anOBJECTID
field for me instead.
- Start the Koop dev server via
koop serve
and/ornpm start
(again, not sure whether there's a distinction). - Some trial and error (and some review of the GeoServices specification) yields the following endpoint as a location where the whole dataset will be returned in JSON format compatible with the GeoServices API: http://localhost:8080/koop-provider-csv/food-data/FeatureServer/0/query
- However, this combination of Koop provider (CSV) and output (GeoServices) plugins, by default, returns the first 2000 features only. Our dataset has more than 2000 features (we can tell because
exceededTransferLimit
istrue
), so we need to pass a URL parameter overriding this default limit. The Koop FAQ sheds some light on this - we can useresultRecordCount
to get more than 2000 results. We could just passresultRecordCount=999999
or some other very large number, but this is inelegant. The FAQ doesn't mention it but a lucky guess reveals thatresultRecordCount=-1
causes the endpoint to return all results: http://localhost:8080/koop-provider-csv/food-data/FeatureServer/0/query?resultRecordCount=-1
Properly speaking, one doesn't "install Heroku" per se. Heroku is a platform-as-a-service company, not a piece of software. By "install Heroku," rather, I mean "install software on my local machine that lets me interact with Heroku." Specifically, this is the Heroku CLI. The below checklist is basically stolen from that Heroku support article.
- Install Git, a prerequisite for the Heroku CLI. (Or, because I already had Git installed, open Git Bash and run
git update-git-for-windows
just to make sure I'm using the latest version.) - Install Heroku CLI.
- Verify that installation was successful by opening a terminal and typing
heroku --version
- this printedheroku/7.42.13 win32-x64 node-v12.16.2
for me. - Type
heroku login
to connect Heroku CLI with your (our!) Heroku login credentials.
Another very helpful tutorial, here.
- Open
package.json
in thekoop-test
directory and specify the version of Node.js needed for our Koop app. This is simply the version of Node.js I installed earlier (v12.x). See the guide for the specific syntax needed. - Again pulling code from the tutorial, edit
src/index.js
to use the port number that Heroku assigns dynamically to each dyno (virtual machine) instead of the static port number specified inconfig/default.json
.- I found, when trying to run step 3 below, that the code provided by the Koop tutorial needed a slight change:
process.NODE_ENV.$PORT
needed to be replaced byprocess.env.PORT
. This Heroku help page helped me identify the correct syntax. - In case you're wondering, like I was, where the
process
object referenced in the new code comes from, it is a global variable created when a Node.js process starts up.
- I found, when trying to run step 3 below, that the code provided by the Koop tutorial needed a slight change:
- From the
koop-test
directory, runheroku local web
to test whether the app will run correctly. This will make the app available at http://localhost:5000/ (note: different port number than before). - Now that the Koop app is properly configured, let's commit and push all changes to Git/GitHub. I usually use the Git interface within VS Code or GitHub Desktop to do so, but you geniuses who actually understand Git can use whatever terminal commands you like. :)
Still working from this tutorial.
- From the
koop-test
directory, runheroku create
to create a new app on Heroku. This will give the app a random name, but we can change the name later. - In Git Bash (or maybe also in any other terminal), from the
koop-test
directory, rungit push heroku master
. This will push themaster
branch from GitHub to a remote on Heroku, which will receive the code and then build/deploy the app accordingly. This worked for me first try and I was blown away... for a little while, anyway. - Near the bottom of the output from that terminal command, you will see a URL to the deployed app (in my case, https://glacial-dawn-93110.herokuapp.com/). Open this up, append the route we identified earlier, and we're in business: https://glacial-dawn-93110.herokuapp.com/koop-provider-csv/food-data/FeatureServer/0/query?resultRecordCount=-1
- Actually, no, we're not in business. That route (and other
query
routes) doesn't work. The app crashes and we may even need to runheroku restart
to get it running again. But other routes do work! Just not thequery
route; you know, the one we actually need. - Try hard to diagnose the problem, but fail. Post a question or two to StackOverflow.
- Document what you've done and push to GitHub. Share with your teammates; maybe they can figure out what is going wrong.
- Email the main developer. His name is Rich Gwozdz and he works at Esri. He isn't sure what's going on either, but he suggests enabling debugging the Heroku deployment and points you toward a very helpful Stack Overflow thread.
Working from the top answer on the Stack Overflow thread, with help from Heroku's documentation on Heroku Exec and remote debugging.
- In the
koop-test
directory, create a new file namedProcfile
. Arguably we should have had one of these all along, because Heroku uses this file to determine how to deploy the app, but Heroku's defaults work fine so we didn't need to create one until now. - Edit
procfile
to tell Heroku to start Node.js with debugging enabled:web: node --inspect=9090 src/index.js
. The9090
tells Heroku which port to use for debugging; we could have used almost any other port (except low-numbered ports like 80 which are already used for other purposes such as general web traffic). - From the
koop-test
directory, runheroku ps:exec
to enable the debugging connection. When you get to a$
prompt, typeexit
to exit that prompt. - Redeploy the Heroku app: commit changes to GitHub, then
git push heroku master
. - Run
heroku ps:forward 9090
to start port forwarding (whatever that means). - Depending on which IDE you're using (I use VS Code), create or edit launch.json and add the following configuration:
{
"type": "node",
"request": "attach",
"name": "Heroku",
"port": 9090
}
- Launch the "Heroku" debugger in your IDE.
- The debugger notes a caught exception having to do with identifying the text encoding of the input CSV. Because this exception is caught, I really don't think this is the problem, but let's try to force UTF-8 encoding (rather than ASCII) and see if that helps. ... It does not. Same problem as before, and the caught exception is still present. Now the default encoding is UTF-8, but the crash still happens at the
query
route. - Try a different implementation of
koop-provider-csv
that does not use the theautodetect-decoder-stream
package. First we need to remove the originalkoop-provider-csv
:- Remove the
koop-provider-csv
reference insrc/plugins.js
. - Edit
config/default.json
to either remove or repurpose thekoop-provider-csv
configuration (by changingkoop-provider-csv
to@ntkog/koop-provider-csv
). Edit: I later found thatkoop-provider-csv-ntkog
expects theconfig
entry to be underkoop-provider-csv
so no change toconfig/default.json
was needed after all. - Run
npm uninstall koop-provider-csv
to remove the package. - Delete the directory
src/koop-provider-csv
.
- Remove the
- Now we need to install the new
koop-provider-csv
:npm install ntkog/koop-provider-csv
, thenkoop add provider koop-provider-csv-ntkog
. - That one crashes for different reasons, plus Haoliang has recently published an update (version 3.1.0) to the original
koop-provider-csv
that avoids the text encoding exception. So let's go back to that plugin:- Remove the
koop-provider-csv-ntkog
reference insrc/plugins.js
. - Run
npm uninstall koop-provider-csv-ntkog
. - Delete the directory
src/koop-provider-csv-ntkog
.
- Remove the
- Now when we run
heroku local web
, the route http://localhost:5000/koop-provider-csv/food-data/FeatureServer/0/query works great. - But when we redeploy our app via
git push heroku master
, the route https://glacial-dawn-93110.herokuapp.com/koop-provider-csv/food-data/FeatureServer/0/query still crashes the app. This is a total bummer. Haoliang is looking into why this might be the case, but for now we will need to try a different platform. - I'm going to try Google App Engine, part of the Google Cloud ecosystem. But I'll do that in a different repository than this one.