Using data to cool data centers
Data centers consume 2–3% of worlds power¹. 30–50% of this power goes into keeping it cool². A system of different mechanisms works together to bring heat out from a datacenter and discard it into the atmosphere. These mechanims are controlled by their own local control systems. In this post, we detail how to control a system of systems more efficiently.
Problem
Why are they inefficient ?
- Local controls
- Tacit knowledge
- Complex interaction
- Difficult to model
Approach
Can we design a better control system ?
- Data based modelling
- Fixed point optimisation
- Reinforcement learning on data model
- Reinforcement learning directly on system
- Continuous control
Let us try this on a simple simulator ?
- Environment
- Red balls are hot, blue balls are cold
- Physics engine simulates motion of balls
- Reward is given when all servers have cooled down
- Time penalty for taking too long
- Pymunk engine
- Trpo agent
- Results
Solution
Modelling a real data center
- Sensory data from a real DC. Glance into data, simple EDA.
- Part based models
- Time delay in action
- LSTMs to simulate individual parts
- Each part connected to another
- Part connection graph
- State Machine composed of these parts is our simulator
- Controls, latent variables,
- Accuracy of simulation
- Model sanity check
- By product, predictive maintenance
Simple optimisation on data model
- Better setpoints according to weather
- Reacting with a chiller instead of PAHUs
RL policies
- Action space of controls
- Agent
- Rewards
- Results
Taking to production
System design
- Client side push
- Time series database
- Log cuts for model training
- Model updates using dependency tree
- What is a policy and how to deploy one ?
- Monitoring
- Fallback and safety mechanisms
References
dcool
data center cooling with reinforcement learning