This project involves the analysis and visualization of spatial data using edge bundling techniques. The primary goal is to uncover patterns and relationships within the data and present them in an interpretable manner. The dataset is loaded from an Excel file containing information related to edge bundling.
- The provided Excel file, "physical-spatial_Edge Bundling.xlsx," is loaded into memory.
- All sheets from the Excel file are read, and the head of each dataframe is printed.
- Whitespace is removed from column names and cells where possible.
- The first sheet, assumed to contain relevant data for edge bundling, is selected.
- A hierarchical clustering dendrogram is created to identify clusters within the data.
- Agglomerative clustering is applied to the matrix, and cluster labels are added to the dataframe.
- The data is prepared for edge bundling visualization.
- A graph is created from the edges dataframe using NetworkX.
- Nodes are colored based on cluster membership.
- An arc diagram is generated to visualize the connections between nodes.
- Intra-cluster and inter-cluster edge weights are analyzed and visualized using bar plots.
- The distribution of edge weights within each cluster is visualized using a boxplot.
- Statistics for each cluster, such as mean and quartiles, are calculated and displayed.
- Features are standardized and reduced to two principal components using PCA.
- KMeans clustering is applied to the PCA-reduced data.
- The clustered data is visualized using a scatter plot.
- Explained variance ratio and cluster centers are printed.
This comprehensive analysis and visualization project provide insights into spatial data relationships, emphasizing edge bundling techniques. The combination of hierarchical clustering, edge bundling visualization, and advanced analyses contributes to a thorough exploration of the dataset. The results can aid in understanding patterns and structures within complex spatial data.
In this section of the code, we perform hierarchical clustering to identify clusters within the dataset. Additionally, we prepare the data for edge bundling visualization using the NetworkX library.
from sklearn.cluster import AgglomerativeClustering
# Determine the number of clusters
n_clusters = 3
# Apply hierarchical clustering to the matrix
hc = AgglomerativeClustering(n_clusters=n_clusters, affinity='euclidean', linkage='ward')
cluster_labels = hc.fit_predict(matrix)
# Add cluster labels to the dataframe
matrix_df['cluster'] = cluster_labels
# Display the head of the dataframe with cluster labels
print(matrix_df.head())
-
Number of Clusters:
- The variable
n_clusters
is set to 3, indicating the desired number of clusters.
- The variable
-
Hierarchical Clustering:
AgglomerativeClustering
is employed with Euclidean distance and Ward linkage.- Cluster labels are assigned to each row in the matrix.
-
Updating Dataframe:
- Cluster labels are added to the original dataframe (
matrix_df
).
- Cluster labels are added to the original dataframe (
# Now we will prepare the data for the edge bundling visualization
# We will create a new dataframe with source, target, and weight for the edges
edges = []
for i in range(len(matrix)):
for j in range(i+1, len(matrix)):
if matrix[i][j] > 0: # Assuming that a weight of 0 means no edge
edges.append({'source': matrix_df.index[i], 'target': matrix_df.index[j], 'weight': matrix[i][j], 'cluster': cluster_labels[i]})
edges_df = pd.DataFrame(edges)
# Display the head of the edges dataframe
print(edges_df.head())
-
Edge Data Preparation:
- A new dataframe (
edges_df
) is created to store edge information (source, target, weight, and cluster). - A nested loop iterates through the matrix, identifying edges based on non-zero weights.
- A new dataframe (
-
Display Edge Data:
- The head of the edges dataframe is printed for inspection.
import networkx as nx
from itertools import cycle
# Create a graph from the edges dataframe
G = nx.from_pandas_edgelist(edges_df, 'source', 'target', ['weight', 'cluster'])
# Define colors for clusters
cluster_colors = cycle(['red', 'green', 'blue'])
# Assign colors to nodes based on their cluster
node_colors = [next(cluster_colors) if node in edges_df['source'].values else 'black' for node in G.nodes()]
# Draw the graph
plt.figure(figsize=(12, 12))
pos = nx.spring_layout(G, seed=42) # Use spring layout
weights = nx.get_edge_attributes(G, 'weight')
nx.draw(G, pos, with_labels=True, node_color=node_colors, width=list(weights.values()))
plt.title('Edge Bundling Visualization')
plt.show()
-
Graph Creation:
- A graph (
G
) is created using NetworkX from the edges dataframe.
- A graph (
-
Cluster Colors:
- Colors for clusters are defined cyclically.
-
Node Colors:
- Nodes are assigned colors based on their cluster membership.
-
Graph Visualization:
- The graph is visualized using a spring layout, and edge weights are considered in the visualization.
-
Result Display:
- The resulting edge bundling visualization is displayed.
This section of the code integrates hierarchical clustering and edge bundling to provide a visual representation of clusters and their connections within the spatial dataset. The resulting graph enhances the understanding of spatial relationships and patterns.
Last updated on: 2024-02-16
Last updated on: 2024-02-18
Last updated on: 2024-02-21
Last updated on: 2024-02-24
Last updated on: 2024-02-28
Last updated on: 2024-03-02
Last updated on: 2024-03-02
Last updated on: 2024-03-03
Last updated on: 2024-03-08
Last updated on: 2024-03-09
Last updated on: 2024-03-10
Last updated on: 2024-03-11
Last updated on: 2024-03-13
Last updated on: 2024-03-13
Last updated on: 2024-03-15
Last updated on: 2024-03-22
Last updated on: 2024-03-27
Last updated on: 2024-04-02
Last updated on: 2024-04-03
Last updated on: 2024-04-03
Last updated on: 2024-04-08
Last updated on: 2024-04-09
Last updated on: 2024-04-11
Last updated on: 2024-04-15
Last updated on: 2024-04-19
Last updated on: 2024-04-22
Last updated on: 2024-04-26
Last updated on: 2024-05-01
Last updated on: 2024-05-04
Last updated on: 2024-05-07
Last updated on: 2024-05-12