Welcome to the iMessage Embedder project! We provide an intuitive way for you to extract your iMessages from a Mac and convert them into 'embeddings' - mathematical representations of data. Leveraging these embeddings, you can perform some fascinating operations on your message data and gain insights you never thought possible.
Note: This only works on macOS.
Here are the steps you need to follow to get this tool up and running:
Important: As of macOS Mojave, you will need to grant your Terminal "Full Disk Access". This allows Python to interact with your iMessage database. Please follow the steps below to grant this access:
- Open your System Preferences.
- Navigate to Security & Privacy.
- Select the Privacy tab.
- Scroll down in the list and click on Full Disk Access.
- Click the lock in the bottom left to allow changes. You'll be prompted to enter your password.
- Click the '+' button to add an application. You should locate your Terminal application, usually found in
/Applications/Utilities/
. - Close the System Preferences.
- Quit and reopen your Terminal.
- After granting Full Disk Access to your Terminal, rerun the script.
- Note: If you're using the VSCode embedded terminal, you'll need to grant access to the VSCode app, not the Terminal.
Run the following command in your Terminal to install the required Python packages:
pip install -r requirements.txt
Execute the following command:
python src/embed_messages.py
This might take a few minutes, so hang tight and let the script do its work.
Optional: If you'd like to stitch together and embed full conversation threads, use this command:
python src/embed_conversations.py
Now that you have your iMessage embeddings, here are a few fun and interactive things you can do:
Try out this feature using the following command:
python src/query.py
Group your messages based on patterns and themes:
python src/cluster.py
Click here for more details on clustering:
This clustering process is designed to discover patterns and structure within your iMessage history. Here's a brief overview:
-
Clustering: Messages and their embeddings are loaded from Chroma, which are then used for dimensionality reduction and clustering.
-
Cluster Analysis: Each unique cluster is individually analyzed, involving keyword extraction (using TF-IDF vectorization) to pinpoint the most significant words for each cluster, and topic modeling (using LDA) to identify the key themes within the cluster.
-
Cluster Representatives: A representative message or set of messages is identified for each cluster, typically the one(s) closest to the geometric center of the cluster. This representative provides an overview of what the messages in the cluster look like.
-
Visualization: We've made an effort to visualize this data so you can grasp the structure at a glance. Different visualizations are offered depending on how much data has been embedded. Four options are available for labeling:
Viewing clusters without labels, With representative labels, With top 10 representatives per cluster (recommended), or with all data points labeled, (NOT recommended but kinda fun)
Powered by Chroma 🚀