Large Spatial Point Dataset Extraction and Visualization
ClipGeo is a side project I developed over the years to clip, extract and visualize simple but very large georeferenced point datasets (such as 200 Million Flickr photo locations). The tool is optimized for speed. For example, by using a geo-modified binary search, it is possible to map and extract 6 million photo locations in Germany in under 2 Minutes on a regular laptop.
Filtering, extracting, clipping and visualizing large georeferenced point datasets is not possible with common GIS Software, such as ESRI ArcGIS. My experience was that beyond 5 million points, ArcGIS quits. This tool was build to initially extract parts of a larger point dataset to be imported into other Software, such as ArcGIS, for more advanced analysis. How the speed of reading data and exporting CSV data is improved in ClipGeo:
- CSV files are not read fully, column by column, but line by line, using Streamreader, and only up to the point in each line where data is required
- When exporting/clipping data, CSVs are not re-formatted. Instead, each line is copied as a whole from original to output CSV
- Data is pre-structured to improve clipping and filtering. So far, data can be structured spatially using QuadTrees (each subfolder represents a single quad) or temporally based on Days, Months or Years (etc.)
The following code is at the heart of the Lat/Lng-point mapping. It is a fast, geo-modified binary search that assigns the best pixel-location for each pair of coordinates. After doing some research, I believe this is one of the fastest ways to map millions of points in just a few seconds.
'Sorted list for binary search lat/long
Dim YList As New List(Of Double)
Dim XList As New List(Of Double)
'Dictionary for fast assigning of coordinates to pixels
Dim YDict As Dictionary(Of Double, Integer) = New Dictionary(Of Double, Integer)
Dim XDict As Dictionary(Of Double, Integer) = New Dictionary(Of Double, Integer)
Function bestPixel(ByVal searchValueLat As Double, ByVal searchValuelng As Double) As GMap.NET.GPoint
'Point Mapping function: Input (LatLng), Output (Best pixel-location on map)
'Needs precalculation of Pixel-LatLng-grid (Sub: precalcValues)
Dim indexY As Long = YList.BinarySearch(searchValueLat)
Dim indexX As Long = XList.BinarySearch(searchValuelng)
'Binary Search for best corresponding pixel ID on map
If indexY < 0 Then
indexY = indexY Xor -1
End If
If indexX < 0 Then
indexX = indexX Xor -1
End If
bestPixel.Y = YDict.Item(YList.Item(indexY))
bestPixel.X = XDict.Item(XList.Item(indexX))
End Function
Public Sub precalcValues(ByVal Height As Integer, ByVal Width As Integer)
'Precalculate LatLng for each map pixel
YList.Clear()
YDict.Clear()
XList.Clear()
XDict.Clear()
'Precalc CoordinatesToPixelLocations
For yy As Integer = 0 To Height
Dim Cord As Double = GMapControl1.FromLocalToLatLng(0, yy).Lat
YList.Add(Cord)
YDict(Cord) = yy
Next
YList.Sort()
For xx As Integer = 0 To Width
Dim Cord As Double = GMapControl1.FromLocalToLatLng(xx, 0).Lng
XList.Add(Cord)
XDict(Cord) = xx
Next
XList.Sort()
End Sub
- todo: future goals, extending scope of program beyond Flickr photo data (include Twitter & Instagram, for example)
This project includes and makes use of several other projects/libraries/frameworks:
FastPix - (c) Vic Joseph 2009-2013
A fast substitute for Bitmap.GetPixel and Bitmap.SetPixel Used for fast bitmap manipulations (map render, alpha values) in visual.vb
DotSpatial (DotSpatial.Data & DotSpatial.Topology)
Opening Shapefiles & extracting point coordinates; Optional used in Shapefile intersect method in clipdata.vb
Online Tile-based map display in windows forms, visualization of shapes & overlays, interface functions (such as selecting analysis extent)
Right-Click Context Menu for Photo display on maps (Photo collection based on descending popularity, including links to original photos online)
.. at other times, code was slightly modified before incoorporating it into the project:
Point-In-Polygon Raycasting Algorithm 1998-2007 Darel Rex Finley & Patrick Mullen
Slight modification to apply test in a spatial context
GNU GPLv3
2018-03-21 ClipGeo v0.9.500
- Added support for additional CSV structures
- Tested with Twitter and Instagram data
- Fixed Locals/Tourists filtering
- Removed explicit references to Flickr-CSV structure
- PhotoIDs are now handled as strings, not Long Integers (increased scope)
2017-03-08: ClipGeo v0.9.300 Rev19
- First published version
- Added Wiki
- Solved some DPI Display issues
- Minor Bugfixes and improvements, cleaned up code
2017-02-08:
- Initial commit: cleaned up project, removed unnecessary references to functions not used anymore.
- This project is a branch of a larger project. I removed all links to the larger project to continue developing this branch separately.
- Todo:
- translate comments to english
- clean up the mess
- provide instructions for using program