The MacOS Agent is a straightforward, lightweight solution based on a Large Language Model (LLM) that leverages Dify, an AI application development platform. This agent enables users, even children, to control MacOS with ease using natural language commands, making it as simple as conversing with a tech expert.
While it may sound similar to Siri, the MacOS Agent offers enhanced capabilities, particularly through its support for multiple rounds of conversation, allowing users to maintain context and continuity in their tasks. For instance, you can ask the Agent to provide some text and then request it to convert that text into an Excel or Word file.
Here are some use cases I tried:
- what is the memory usage
- unused memory
- disk usage
- my disk capacity
- how many spaces left on my computer
- time since boot
- last boot time
- what is the CPU temperature
- list ports opened
- lan ip
- list devices in local LAN
- take a screenshot
- open a new text file
- create text file on desktop and open it
- create a markdown file on desktop with a GTD style TODO list and open it
- move all files on desktop to a temp dir
- how many files older than 10 days in ~/Desktop
- copy them to a new dir named "10-days-old" in that dir
- list files older than 10 days in ~/Desktop
- create an Excel file contains the file name and create time
- check ~/Desktop/macos-agent-playground.html and explain what it does
- give me a shell script that can watch an app's memory usage by app name when it reaches a threshold then restart it
- give me an Automator app that can watch an app's memory usage by app name when it reaches a threshold then restart it
- remind me to clock in after 5 seconds
- wait 5 seconds then send me a message with "Timeup"
- send me a message "Call someone" at 18:54
- display an alert "Call someone" at 19:01
- alert me about "Do something" at 18:58
- run
top
command # for testing timeout control - find all processes which name is "top"
- find all processes which name is "top" and kill them
- restart app XXX
- open system settings
- turn dark mode on/off
- what's my internet IP
- show me the price of BTC/Gold
- ask Siri for what is the weather like tomorrow
- ask Siri for what ...
- 9.11 and 9.9——which is bigger? run code to compute the result
- If a banana weighs 0.5 pounds, I have 7 pounds of bananas and 9 oranges, how many fruits do I have in total? run code to compute the result
Notes: The outcome of this scenario depends on the performance of the LLM
-
run a http server on ~/Desktop in the background
Tips: to quit:
quit http server on port 8000
-
create a html file named "macos-agent-playground.html" that having 2 iframe pages. which iframe "desktop-page" is 70% width and iframe "agent-page" is 30%; both using frameborder=1, style="width: 100%; height: 100%; min-height: 700px". iframe "desktop-page" url is "http://localhost:8000/", with a "refresh" button at top that can reload the the url iframe "agent-page" url is "${chat app Embed on website using iframe url}".
-
Explain what is Tic-Tac-Toe game, I want you to create a Tic-Tac-Toe game that human can play VS AI using HTML. Create a dir name "Tic-Tac-Toe-game" and put code files in it.
- I need to create a flowchart like
Start -> Process A -> Condition -> Process B -> End
, flow direction is from up to down. Help me create this diagram in a format that can be opened in Draw.io and open it with draw.io.app
- macos-agent-brief-demo
macos-agent-brief-demo-20240716-compressed.mp4
- macos-agent-file-management-demo
macos-agent-file-management-demo-20240716-compressed.mp4
- macos-agent-code-playground-demo
macos-agent-code-playground-demo-20240716-compressed.mp4
- macos-agent-create-diagram-demo
macos-agent-create-diagram-demo-20240719-compressed.mp4
The MacOS Agent operates through a series of steps:
-
Run the
macOS Agent Server
: This server returns a system prompt for the LLM, including the Agent's role profile, environment information, and knowledge base. -
Set up the
LLM:get_script
node: This node uses the system prompt to have the LLM act as a "macOS Agent," tasked with achieving user goals using AppleScript. -
Send User Input: The user's goal is sent to the
LLM:get_script
node to receive suggestions, including executable AppleScript. -
Execute AppleScript: The LLM output is sent to the
macOS Agent Server
, which extracts and runs the AppleScript, returning the execution result. -
Formulate Response: The execution result is combined with the user's goal and LLM output into a
reply_prompt
for a comprehensive response. -
Respond to User: A
LLM:reply
node uses thereply_prompt
to respond to the user.
The agent is compatible with both locally-hosted instances of the Dify platform (cloud-hosted not tested).
- Clone the Repository
- Start the Agent Server
- Import Chatbot Configuration
- Configure the Chatbot
- Publish the Chatbot
git clone https://github.com/rainchen/MacOS-Agent.git
File list:
- README.md: This documentation file
- macos_agent_server.py: Script to run the
macOS Agent Server
- MacOS Agent.yml: Configuration file for importing into Dify as a Chatbot app
- knowledge.md: File for extending the Agent's knowledge
- test.sh: Script for running test cases to verify agent server functionality
No additional installations are required as the code is designed to work with MacOS's built-in Python version and standard libraries.
python macos_agent_server.py --port 8088 --apikey "a-secret-key" --debug
Arguments:
--port
: Port number for the server--apikey
: API key for authorization--debug
: Optional; enables detailed logging
Note: Ensure the server is only run on a Mac you have control rights to, and never expose the --apikey
publicly.
Navigate to the Dify Studio homepage, click "Import DSL file" and select "MacOS Agent.yml" file from the cloned repository.
Configure the Code:config
node with details such as the agent API endpoint, API key, and script timeout. Also, set the LLM models for the LLM:get_script
and LLM:reply
nodes.
Here is the example config for Code:config
node:
"agent_api_endpoint": "http://host.docker.internal:8088",
"agent_api_key": "a-secret-key",
"script_timeout": 60
Options explain:
agent_api_endpoint
: when Dify is deployed using docker-compose, port is same as--port
, e.g.::8080
,http://host.docker.internal:8088
agent_api_key
: same as--apikey
used in [start agent server], e.g.:a-secret-key
script_timeout
: control max executing time of a script, 60 seconds recommend
Click "Publish" and then "Update" to make the chatbot live.
After publishing, click [Run App] to open the Chatbot web view, input your goals, and refer to the "Use Cases" section for guidance.
Recommended to run Embed on website
and install Dify Chatbot Chrome Extension, so that you can activate the Agent on any page.
Edit the knowledge.md
file to add more instructions in the same Markdown format and restart the server.
Run the test.sh
script to verify the server's functionality after making any code changes.
sh test.sh --api http://localhost:8088 --apikey a-secret-key
Certain actions are restricted, such as deleting/removing files or shutting down the computer or the Mac Agent Server process.
- Chatbot management and UI: Dify, powerful and convenient AI application development platform.
- Code Generation: 90% of the project's code was generated by AI(deepseek-coder LLM)
- Document Polishing: Assistance in refining the documentation by AI(deepseek-chat LLM)
This project is licensed under the MIT License.