EcoJulia/SpatialEcology.jl

Representation of coordinates

Opened this issue · 7 comments

The example uses :Lat and :Lon, and I don't think this is the best solution.

What about using https://github.com/JuliaGeo/Geodesy.jl objects instead? If the users has standard column names (:latitude, :longitude would be mandatory, :projection assumed wgs84, and :altitude being optional), then the spatial coordinates can be manipulated as a single object instead of two columns.

Maybe we can start with something intermediate?

What I have in mind is a simple 2d (lat/lon) binning of ocurrence data. It's not quite sites, but it would make it easy to get data in a way where SpatialEcology can work with them.

Yup, there is no doubt that the spatial (coordinate) types internally should be explicitly geographical types consistent with the framework being developed at JuliaGeo.

To explain how it is now: the Assemblage object has a SiteFields field that can in principle be any type of spatial representation. It currently support two types, PointData and GridData, depending on whether input data is on a grid. If not specified, the constructor will try to guess it based on the regularity of the input coordinates (https://github.com/EcoJulia/SpatialEcology.jl/blob/master/src/Constructor_helperfunctions.jl#L139-L162). In the example the input variables are regular, so the coordinates are inferred to form a GridData object, which consists of a GridTopology (https://github.com/EcoJulia/SpatialEcology.jl/blob/master/src/DataTypes.jl#L16-L24) and Indices into that GridTopology. So the :Lon and :Lat is only used in the constructor (the idea is to have a really flexible set of constructors so people needn't worry about it).

With regards to PointData, I agree with straight out replacing those points with Geodesy points. And I think the GridTopology could quickly be made compatible by including a projection field in the type.

I'd also like to add a SiteFields type consistent with https://github.com/JuliaGeo/GeoInterface.jl for polygon sites (e.g. countries - this is intended to replace Shapefiles, I think).

Not quite sure what you mean by binning - but the creategrid function can be tweaked to bin pointdata into grid data if needed.

Got it. I had some issues navigating the types.

I'd like to maybe have one of the GBIF examples use SpatialEcology for the mapping -- it should be simple to write a method for Assemblage from Occurrences, correct?

It should be - here is some spaghetti code that does the trick, an Assemblage method based on that would be straightforward. There are a few things that make it tricky (and the code ugly):

  1. Assemblage points are focused on co-occurring species, with only one point per locality. Right now I just merge all the points with the same coordinates - it would be better to use a buffer to merge points, with a certain tolerance, based on perhaps the precision value of the occurrences.
  2. I currently store the points as presence/absence (an Assemblage{Bool}) though there are location counts, as the Assemblage type does not support NA in the occurrence (abundance) matrix). I'm not sure I'd like to use a DataArray, as I don't know how they work as sparse matrices (or if they do at all); not really sure how to deal with this.
using GBIF, DataFrames

uk_birds_query = Dict(
  "taxonKey"=>5231190,
  "country"=>"GB",
  "hasCoordinate"=>true,
  "year"=>2015)

uk_birds = GBIF.occurrences(uk_birds_query) #SpatialEcology also defines occurrences - any good idea for a better name to give it in SpatialEcology? It is the occurrences of a single species in the data set.
uk_birds.query["limit"] = 200
complete!(uk_birds)

# a function to extract field values, from your dataframe.jl
loc(o::Occurrences, f::Symbol) = map(x -> getfield(x, f) == nothing ? NA : getfield(x, f), o)

# get the relevant fields
long = loc(uk_birds, :longitude)
lat = loc(uk_birds, :latitude)
sites = ((x,y)->"$(x)_$(y)").(long, lat) #an identifier for unique sites
abun = loc(uk_birds, :individualCount)
species = loc(uk_birds, :species)

# construct a DataFrame and consolidate all duplicated point occurrences
occ = DataFrame(sites = sites, abun = abun, species = species)
occ = by(occ, [:sites, :species]) do df
  sum(df[:abun][isfinite.(df[:abun])])
end

# I am going to throw out the abundance information for now, as Assemblage types don't allow for NA abundances
# to keep it in, I should:
#occ = DataFrame(sites = occ[:sites], abun = occ[:x1], species = occ[:species])

# construct a DataFrame in the Phylocom format
occ = DataFrame(sites = occ[:sites], abun = 1, species = occ[:species])

# construct a DataFrame of coordinates
coords = DataFrame(sites = sites, long = long, lat = lat)
unique!(coords, :sites)

using SpatialEcology
birds = Assemblage(occ, coords)

using Plots
plot(birds, aspect_ratio = 1.5, alpha = 0.3)

Perhaps we could define stubs for all the types we use in an EcoBase package imported by all the other packages, so SpatialEcology wouldn't need to depend on GBIF.jl to define a constructor that takes Occurrences objects?

The constructor should also extract the taxonomic information and site information from the Occurrences object and put them in the .occ.traits and .site.sitestats DataFrames of the Assemblage object.

Oh I agree that SpatialEcology should not depend on GBIF -- it should be the other way around, like the way it's for DataFrames in GBIF. All I need is to declare a method on my side to return an object in the correct format for SpatialEcology.

I think in the comments of the code snippet above you make a good point about namespaces. I like the way it's done in R, which forces to be explicit when calling functions from another namespace. But to answer the question, what about writing a new method for count? Instead of doing occurrences(species, data), the call would be count(species, data).