To get started: Clone this repository using
git clone --recursive http://github.com/alecjacobson/computer-graphics-kinematics.git
In this assignment we'll consider animating shapes rigged to an internal skeleton. The skeleton is a (graphical) user interface (UI) metaphor. A skeleton is a tree of rigid bones, not unlike the anatomical bones in a human or animal.
Each "bone" in the skeleton is really a UI widget for visualizing and controlling a 3D rigid transformation. A common visualization of 3D bone in computer graphics is a long, pointed pyramid shape. This reveals the twisting rotation as well as the tree hierarchy: the bone points toward its children.
Unlike anatomy where the brain triggers muscles to flex and pull the passive bones around, the bones of a skeleton rig will define the pose of a shape.
For each bone, we will consider three "states".
The "Canonical Bone" of length
The bone is endowed with an
orientation or frame. This helps
define a canonical twisting direction. We will define twisting as rotating about the
For example, in this assignment, bending is accomplished by rotating about the
Composing a twist, bend and another twist spans all possible 3D rotations.
We call the three angles composed into a rotation this way, Euler angles (not to be confused with the homophonous Oiler angles).
To assemble a skeleton inside our shape will we map each bone from its
[canonical bone][1.canonicalbone] to its position and orientation in the
undeformed model. Maintaining the rigidity of the bone, this means for each bone
there's a rigid transformation
We
use the convention that the "canonical tail" (the origin $(0,0,0)$) is mapped to
the "rest tail" inside the model. This means that the translation part of the
matrix
The bone's
rotation is chosen so that the "canonical tip"
Typically the "rest tail" of is coincident with the "rest tip" of its parent (if it exists): $$ \hat{\d}_{p} = \hat{\s}. $$
This still leaves any amount of twisting of the bone. In the canonical frame,
we can think of this as pre-twisting the bone along the canonical
Each rest transformation
The final state to consider is when a bone is posed. That is, mapped to a new position and orientation from its rest state.
In general, each rest bone undergoes a rigid transformation
One way to determine the rigid pose transformations
For each bone, (reading the effect of transformations right to left) we first
undo the map from canonical to rest (i.e., via inverting
Question: Does using relative rotations ensure that bone tails stay coincident with parent tips?
Hint: What do you get if you multiply
$\T_i$ and$\hat{\s}_i$ ?
As a base case, the root transformation can be defined to be the identity (no transformation) or the rigid transformation placing the object/character generally into a larger scene.
This has the great advantage that if the entire model is rotated or translated
at the root, the relative transformations still apply correctly. This property
holds locally, too. If bone
It is convenient to express the relative rotations of each bone in the
[canonical frame][1.canonicalbone]. We can utilize canonical twist-bend-twist
rotations (three Euler angles,
\T_{p_i}
\hat{\T}i
\left(\begin{array}{cccc} \Rot_x(θ{i3}) & \begin{array}{c}0\\0\\0\\\end{array} \\ 0\ 0\ 0 & 1\end{array}\right)
\left(\begin{array}{cccc} \Rot_z(θ_{i2}) & \begin{array}{c}0\\0\\0\\\end{array} \\ 0\ 0\ 0 & 1\end{array}\right)
\left(\begin{array}{cccc} \Rot_x(θ_{i1}) & \begin{array}{c}0\\0\\0\\\end{array} \\ 0\ 0\ 0 & 1\end{array}\right)
\left.\hat{\T}_i\right.^{-1}
$$
where the matrix
When implementing a skeleton, it is tempting to use a traditional tree data structure where each node (bone) contains a list of pointers to its children (other bones). However, by the right-to-left reading of the forward kinematics formulae above, it is more convenient to use a data structure where each node (bone) has a pointer to its (unique) parent (other bone). This is ridiculously named a Spaghetti Stack.
Question: What abstract data-structure is good for ensuring a parent's transformation
$\T_{p_i}$ are computed before its child's$\T_i$ ?Hint: 🥞
To create a long animation, specifying three Euler angles for every bone for every frame manually would be too difficult. The standard way to determine the relative bone transformations for each frame is to interpolate values specified at a small number of "key" times during the animation. Linear interpolation will lead to a choppy, robotic animation (try this first!). Instead Catmull-Rom interpolation will produce a smooth animation. Fancier interpolation such as the Kochanek-Bartels method (called TCB in the book) allow better control of easing between key poses.
In this assignment, we will interpolate Euler angles directly. This works well
when only a single angle is changing at a time. However, Euler angles do not
provide easy movement in every rotational
direction. Euler angles model
rotations as twist-bend-twist. For our canonical bones, bending around the
So, for more complex interpolation of rotations, a different representation such as unit quaternions would be needed. This is outside the scope of this assignment.
In the [forward kinematics][forwardkinematics] model, the final position of the tip of a finger is determined by setting (manually or via interpolation) the relative transformations of each joint in the finger, the hand, the elbow, the shoulder, ... This indirect control makes it difficult to achieve basic poses. Instead, we can treat the problem of setting relative rotations of internal bones (shoulder, elbow, hand, ...) as an optimization problem where we try to minimize the distance between the tip of the finger and where we want it to be.
Stated mathematically, for a skeleton with
then we can ask for the best vector of angles
Using forward kinematics, we can express
$$
\x_b(\a) = \T_b \hat{\d}b
$$
where $\T_b$ depends on $θ{b1},θ_{b2},θ_{b2}$ and
We can design arbitrarily complex energies to satisfy our interaction needs. In
this assignment, we consider that there is a list of constrained end effectors
So, over all choices of
$$ \min_{\a} \quad \underbrace{ ∑\limits_{i=1}^k ‖\x_{b_i}(\a) - \hat{\x}{b_i}‖² }{E(\x_b (\a))} $$
We will further constrain our problem by imposing
upper and lower bounds
on our angles
So our full optimization problem becomes
$$
\min_{\a^{\text{min}} ≤ \a ≤
\a^{\text{max}}}
\quad E(\x_b(\a))
$$
where
This type of minimization is non-trivial. Our energy is a quadratic sum of
squares in
We're faced with a bound constrained non-linear optimization problem. To solve
it, we will construct an initial guess and then iteratively improve the guess by
moving in a direction that decreases
The idea behind gradient descent is intuitive: if you want to get to the bottom of a canyon, look at the ground and walk in the direction that goes downhill.
So, we iteratively take a step in the negative gradient direction of our
objective function
Applying the chain rule, this iteration becomes
where
The change in tip positions
Written in terms of
Question: Can we take an arbitrarily large step
$σ>>0$ ?Hint: What if we just need to change
$\a$ by a small, non-zero amount? What would chooing$σ=1,000,000$ do to$\a$ ? What would that in turn do to$E(\x(\a))$ ?
For sufficiently small
If the gradient of
To ensure that our bounds are obeyed, after each step we need to project onto our constraints by snapping each value to its respective bound if necessary:
We'll refer to this as a projection filter acting on the entire vector
The local gradient of a function can be very different from the "best" descent direction. The choice of
$σ$ reflects how much we "trust" this direction. Unfortunately, if$σ$ is too large our iterations may diverge. If$σ$ is too small, we will have to do many iterations.In order to find a better descent direction, let's assume we knew more about
$E$ . That is, suppose we also knew its second derivatives:$\frac{d²E}{d\x²}$ .Given an initial guess
$\x⁰$ we're looking to find a change$∆\x$ so that $E(\x⁰
- ∆\x)$ is a stationary point.
Starting with our equilibrium equation, $$ \frac{dE(\x)}{d\x} = \0 $$
we substitute in
$x = \x⁰ + ∆\x$
$$ \frac{dE(\x⁰+∆\x)}{d∆\x} = \0 $$ Plugging in a Taylor series expansion
$$ E(\x⁰+∆\x) \approx E(\x⁰) + \frac{d E(\x⁰+∆\x)}{d\x} ∆\x + \frac{d²E(\x⁰+∆\x)}{d\x²}\frac{(∆\x)²}{2} + …$$ and dropping higher order terms (
$…$ ), we get:$$ \frac{d}{d∆\x}(E(\x⁰) + \frac{d E(\x⁰+∆\x)}{d\x} ∆\x + \underbrace{\frac{d²E(\x⁰+∆\x)}{d\x²}}_\H\frac{(∆\x)²}{2}) = \0, $$ where we call
$\H ∈ \R^{|x| × |x|}$ the Hessian matrix.Applying the differentiation by
$∆\x$ we get a system of equations:
$$ \frac{d E(\x⁰+∆\x)}{d\x} + \frac{d²E(\x⁰+∆\x)}{d\x²} ∆\x = \0. $$ Solving for the change $∆x$ we get: $$ ∆x = -\left.\H\right.^{-1} \frac{d E(\x⁰+∆\x)}{d\x}. $$ So a raw Newton's method update would be:
$$ \x ← \x - \left.\H\right.^{-1} \frac{d E(\x⁰+∆\x)}{d\x}. $$ If our Taylor series approximation was perfect (no high order terms in
$…$ ; in otherwords$E$ was quadratic), then Newton's method would be perfect: a single update immediately takes us to the minimum.Newton's method is problematic for a number of reasons.
- We built our step purely based on the equations for a stationary point. Nothing says we won't get sent toward a maximum or saddle-point.
$\H$ is often difficult or expensive to compute.$\H$ may be singular.- Inverting
$\H$ (even if possible) is often slow.- Our system is built off a local approximation of
$E$ so the descent direction may still point in the wrong direction.Since we're approximating
$E$ at every iteration anyway, we'll skirt many of these issues by considering various approximations of the Hessian matrix$\H$ . We'll never actually compute$\H$ .The simplest approximation of
$\H$ is the identity matrix$\I$ . Plugging this into our truncated Taylor series expansion above, our approximation would read:
$$ E(\x⁰) + \frac{d E(\x⁰+∆\x)}{d\x} ∆\x + \I \frac{(∆\x)²}{2}. $$ and our step reduces to good ol' gradient descent:
$$ \x ← \x - \frac{d E(\x⁰+∆\x)}{d\x}. $$ Given that we have already computed first derivatives in the Jacobian
$\J =\frac{d\x(\a)}{d\a}$ , an even better approximation for Hessian$\H$ than the identity$\I$ would be to use$\J^\transpose J$ . The resulting update becomes:
$$ \a ← \a + (\left.\J\right.^\transpose\J)^{-1} \left.\J\right.^\transpose \frac{dE(\x)}{d\x} $$ Unlike
$\H$ ,$\J^\transpose\J$ is easy to compute if we're already computing$\J$ . It is guaranteed to be positive semi-definite and it is possible to invert or reliably pseudo-invert ($\J^+$ acting in place of$(\left.\J\right.^\transpose\J)^{-1} \left.\J\right.^\transpose$ ).The descent directions are often significantly better than gradient descent. As a result this method, called Gauss-Newton, requires many fewer iterations to converge.
It still may try to descend in bad directions. In particular, for inverse kinematics, this Gauss-Newton method performs poorly if the desired positions are not reachable: over extending an arm. First the solution locks in place and then diverges. This happens when our Hessian approximation
$\J^\transpose\J$ starts misbehaving.A good fix is to blend between the gradient descent and Gauss-Newton search directions. That is blend between
$\I$ and$\J^\transpose\J$ . This is called the Levenberg-Marquadt algorithm.
But how do we compute the kinematic Jacobian
$$
\J_{i,j} = \lim_{h → 0} \frac{\x_i(\a+h δ_j) - \x_i(\a)}{h},
$$
where
We can numerically approximate this limit by fixing
For inverse kinematics, we will need to compute
Forward differencing requires
$O(m)$ evaluations but doesn't require us to change our code for function evaluation at all: we just evaluate it. If we're willing to sprinkle some special types on top of our code (but otherwise leave in all the sub-routine calls, if statements, for loops, etc.), we could use automatic differentiation to compute$\J$ .The are two extremes when it comes to autodiff: forward mode and backward mode.
Forward mode works like finite differencing, except the perturbation to the differentiation variable is symbolic and derivatives are tracked through each basic operation (
+
,-
,sin
,etc.): the total computational cost to construct$\J$ is again$O(m²)$ .Backward mode works by pushing each function call and basic operation onto a list. Derivatives for all variables are then computed as we pop backward through the evaluation: identical to how we read right-to-left on our recursive kinematics formula. This means we compute derivatives with respect to all variables
$\a$ in a single backwards evaluation. The total cost is only$O(m)$ to fill$\J$ .
Whether we're using gradient descent, Newton's method or Gauss-Newton, we a
generally attempting improving our guess by iteratively moving in a descent
direction
Despite our best efforts, this step is not guaranteed to actually decrease
our energy
While there exists an optimal step
Depending on the configuration, it may or may not be possible to exactly satisfy
the constraints (i.e.,
So far we have only discussed bones floating and moving around in space. Ultimately, we would like to deform interesting models: for example, animals and characters. Unlike robots or mechanical objects, the animals tend to deform smoothly, even near joints: an elbow does not tear into two rigid parts when bent. Instead, the skin around the elbow stretches and smoothly warps. Skin closer to the forearm deforms more like the rigid rotation and translation of the forearm, and likewise the skin near the upper arm deforms like the rigid upper arm bone. In between, we see a smooth transition or blend.
To approximate this smooth blending quickly on the computer, we begin with a 3D
triangle mesh in its "rest" position. The "rest bones" are embedded inside of
this model. Each vertex
Smoothly varying weights produce a smooth deformation. In constrast, piecewise-constant weights lead to a piece-wise rigid deformation.
The "pose" position
$$ \v_i = \sum\limits^{m}{j=1} w{i,j} \T_j \left(\begin{array}{c}\hat{\v}_i\\1\end{array}\right). $$
Question: What happens to per-vertex normals after applying a skinning deformation?
Hint: 🤯
Linear blend skinning has many defects. Good "rigging artists" can mitigate these by carefully painting (yes, painting) weight functions and position the [rest bones][2.restbone]. However, some of the linear blend skinning defects are theoretical: most notably problems that occur by averaging rotations as matrices.
Question: What transformation matrix do you get if you compute:
$½ \Rot_x(90°) + ½ \Rot_x(-90°)$ ?Hint: It's not
$\Rot_x(0°)$ .
Eigen::Affine3d
Eigen::AngleAxis
#include <Eigen/Geometry>
- c++ lambda functions and capturing
#include <functional>
igl::lbs
igl::forward_kinematics