Question: statistical summarization about the NKDE
adhamenaya opened this issue · 10 comments
Hi,
Can I use this package to generate statistical summarization about the NKDE like the Central Tendency and Dispersion Measures?
Hello ! Could you please give me more details about your question ? The NKDE is calculated at sampling points along the network. Each sampling point has a specific density value. Are you looking for statistical summarization along the lines of the network ? For the whole network ? Or are you looking for uncertaintiy of the NKDE estimated at each sampling point ?
@JeremyGelb Yes, I am looking for summarization along the whole network. I tried to calculate the central tendency point by calculating the weighted mean of the lines in the entire network, like the following code:
nkde_values <- samples$density # access using weighted mean
weighted_mean_x <- weighted.mean(points[,1], nkde_values)
weighted_mean_y <- weighted.mean(points[,2], nkde_values)
print(paste("Weighted mean (x):", weighted_mean_x))
print(paste("Weighted mean (y):", weighted_mean_y))
where the nkde_values contain the estimated NKDE for each point/line. Do you think this is a valid approach?
@adhamenaya, it is still unclear for me what you try to obtain.
You are calculating the weigted mean of the coordinates of the sampling points based on the estimated densities ?
There is a difference between the sampling points and the events. The events are the locations of real data occuring on your network. Sampling points are arbitrary locations along the network where we estimate the densities of the events (based on kernel functions that "melt" the density of the events).
What is the question you are trying to answer with your analysis ? Are you trying to measure the clustering of your events ? Are you looking for the center of your events ?
If you are interested by the center of your events, note that classical methods of point pattern analysis do not work well on a network. The mean center of the events can not calculated because we are not in an euclidean space. However, you could find the point on the network that minimize the distance to all the events for example.
Thank you for your detailed response. The questions that I'm trying to investigate is to calculate the the distance between two different NKDE, I was thinking if the aggregate summary like central tendancy could he helpful to understand tgr distance/dissimilarity between two different distractions. Otherwise, what could suggest to use to explore the distances between two NKDE, or KDE in general?
I am not sure to understand what would be the distance between two NKDE. I guess that you are interested into the difference in the spatial patern of two sets of events on the same network.
If you have two sets of events, you coud consider simply calculating the difference of the two NKDE and map it. You just need to ensure that the sampling points are the same for both NKDE.
If the number of events is very different between the two sets, you could scale the NDE first to have a more meaningfull comparison.
Thank you very much, actually my comparison is between two different networks. But yes I am interested in finding the differences in the spatial pattern of the same type of event, I have two different events dataset on two different networks. Do you think my question is flawed?
I understand a little bit better your problem now.
You are interested in the differences between similar type of events but on two different networks.
The NKDE could be used in combinaison with other spatial methods like the global Moran I to see how difference in spatial autocorrelation / clustering for both networks.
If you work directly with your events instead of the NDKE, you could also use metrics like the distance to the k nearest neighbours. It will give you a good idea about the spatial dispersion of your events on the two networks.
In a similar fashion, you could use the G and K statistics (https://jeremygelb.github.io/spNetwork/articles/KNetworkFunctions.html)
Definitely, this look interesting. To put more context, I'm trying to calculate the spatial distribution/patterns/dispersion of each different POI type, and I want to represent it in a single value.
For example:
Area 1:
Restaurant type: 0.19
Transport type: 0.34
Business type: 0.53
Area 2:
Restaurant type: 0.23
Transport type: 0.72
Business type: 0.62
...
In summery, I'm trying to capture the spatial dispersion of events in two network as a single value, that I will use these values to calculate the difference/dissimilarity between the Area 1, and Area 2...
Thank you so much for your work and replies. Really helpful.
Well, if you want to use only one metric to characterize the dispersion of your POI on a network, I would recommend to use the distance to a specific neighbour (like the 1st, 2nd, 3rd, etc.) and to report the median of the value among all the POI on a network.
For example, a value of 500 meters for the first neighbour would mean that 50% of your POI are located 500m away from the closests other POI. This is a nicely interpretable measure of dispersion. You could also present the values of the 5% and 90% percentiles of the distribution, this would help to compare the variation of this dispersion measure among several networks.
The only question would be to select the relevant value for the neighbour to reach (1, 2, 3, ... ?). It must be the same for all the networks if you want to compare the obtained results. You could try several values and see which one gives the most pertinent results.
closed because of long time without activity