Chapter 12 Control III – model-predictive & non-linear#

12.1 Introduction#

Model Predictive Control (MPC) builds on the optimal control concepts from previous chapters (e.g., LQR in Chapter 11) by incorporating predictions, optimization, and constraints in real-time. This chapter provides a concise overview, focusing on core definitions, descriptions, and simple code examples. We’ll connect MPC to reinforcement learning (RL) in Chapter 13, where MPC-like planning inspires model-based RL methods.

12.2 MPC Definition and Description#

MPC is a feedback control strategy that:

  • Uses a dynamic model to predict the system’s future states over a finite prediction horizon (e.g., \( N \) steps).

  • Solves an optimization problem at each time step to find the optimal sequence of control inputs that minimizes a cost function (e.g., quadratic error to a setpoint) while respecting constraints (e.g., input limits, state bounds).

  • Applies only the first control input from the sequence.

  • Repeats the process at the next time step with updated measurements (receding horizon).

Key advantages:

  • Handles multi-input multi-output (MIMO) systems, constraints, and non-linearities explicitly.

  • Bridges classical control (e.g., PID, LQR) with optimization-based methods.

For a discrete-time system \( \mathbf{y}_{k+1} = \mathbf{A} \mathbf{y}_k + \mathbf{B} \mathbf{u}_k \), the MPC optimization problem is typically:

\[\min_{\mathbf{u}_{k:k+N-1}} \sum_{i=1}^{N} (\mathbf{y}_{k+i} - \mathbf{y}_{\text{ref}})^T \mathbf{Q} (\mathbf{y}_{k+i} - \mathbf{y}_{\text{ref}}) + \mathbf{u}_{k+i-1}^T \mathbf{R} \mathbf{u}_{k+i-1}\]

Subject to dynamics, \( \mathbf{y}_{k+i} = \mathbf{A} \mathbf{y}_{k+i-1} + \mathbf{B} \mathbf{u}_{k+i-1} \), and constraints (e.g., \( |\mathbf{u}| \leq \mathbf{u}_{\max} \)).

12.3 Linear MPC Example

We’ll simulate a simple double-integrator system (e.g., position control): \( \dot{\mathbf{y}} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \mathbf{y} + \begin{bmatrix} 0 \\ 1 \end{bmatrix} u \), discretized with time step \( \Delta t = 0.1 \).

Use scipy.optimize.minimize for the quadratic cost (approximates QP for linear cases).

import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

# System matrices (discretized double integrator)
dt = 0.1
A = np.array([[1, dt], [0, 1]])
B = np.array([[0.5*dt**2], [dt]])
n_states = 2
n_inputs = 1

# MPC parameters
N = 10  # Prediction horizon
Q = np.eye(n_states)  # State cost
R = 0.1 * np.eye(n_inputs)  # Input cost
y_ref = np.array([1.0, 0.0])  # Setpoint: position=1, velocity=0
u_min, u_max = -1.0, 1.0  # Input constraints

def mpc_cost(u_seq, y0):
    """Quadratic cost function for optimization."""
    cost = 0.0
    y = y0.reshape(-1, 1)  # Ensure column vector
    for i in range(N):
        u = u_seq[i*n_inputs:(i+1)*n_inputs].reshape(-1, 1)
        y = A @ y + B @ u
        state_err = y - y_ref.reshape(-1, 1)
        cost += (state_err.T @ Q @ state_err).item() + (u.T @ R @ u).item()
    return cost

# Simulation
T = 100  # Time steps
y_traj = np.zeros((n_states, T+1))
u_traj = np.zeros((n_inputs, T))
y = np.array([0.0, 0.0])  # Initial state
y_traj[:, 0] = y

for t in range(T):
    # Optimize: initial guess (warm-start: shift previous solution or zeros)
    u_guess = np.zeros(N * n_inputs)
    bounds = [(u_min, u_max)] * (N * n_inputs)
    res = minimize(mpc_cost, u_guess, args=(y,), bounds=bounds)
    u_seq_opt = res.x
    u = u_seq_opt[:n_inputs]  # Apply first input
    y = (A @ y.reshape(-1, 1) + B @ u.reshape(-1, 1)).flatten()
    y_traj[:, t+1] = y
    u_traj[:, t] = u

# Plot
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(y_traj[0], label='Position')
plt.plot([0, T], [y_ref[0], y_ref[0]], 'r--', label='Ref')
plt.legend(); plt.title('State Trajectory')
plt.subplot(1, 2, 2)
plt.plot(u_traj[0], label='Input')
plt.plot([0, T], [u_max, u_max], 'r--'); plt.plot([0, T], [u_min, u_min], 'r--')
plt.legend(); plt.title('Control Input')
plt.show()
../_images/494dab6cb9326c9ead1897d3d3a2d174637b797cec5d003d6cb71fbb66b7982d.png

This code solves the MPC problem at each step, applying constraints via bounds. Warm-starting can be added by shifting u_seq_opt from the previous step as u_guess (e.g., u_guess = np.concatenate((u_seq_opt[n_inputs:], np.zeros(n_inputs)))).

12.4 Nonlinear MPC and Successive Linearization#

For nonlinear systems (e.g., \( \mathbf{y}_{k+1} = \mathbf{f}(\mathbf{y}_k, \mathbf{u}_k) \)), the optimization becomes non-convex. Use scipy.optimize.minimize with methods like ‘SLSQP’ for constraints.

  • Successive linearization: At each MPC step, linearize the nonlinear model around the current state/trajectory, solve the linear MPC, and iterate (e.g., 2-5 times) for refinement.

  • Warm-start: Use the previous optimal sequence as the initial guess to speed up convergence.

  • Constraint handling: Include state/input constraints directly in the optimizer (e.g., via constraints arg in minimize).

Example: nonlinear MPC

Consider the nonlinear pendulum: \( \frac{d^2 \theta}{dt^2} = -\frac{g}{l} \sin(\theta) + \frac{1}{ml^2} u \)):

import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

# Nonlinear pendulum parameters
g, l, m = 9.81, 1.0, 1.0
dt = 0.05
y_ref = np.array([np.pi, 0.0])  # Inverted position, zero velocity
N = 10  # Prediction horizon
Q = np.diag([10.0, 1.0])  # State weights: more on angle
R = 0.01  # Input weight (small to allow larger torques)
u_min, u_max = -10.0, 10.0  # Input constraints

def nonlinear_dynamics(y, u):
    theta, omega = y
    dtheta_dt = omega
    d2theta_dt2 = - (g/l) * np.sin(theta) + (1/(m*l**2)) * u
    return np.array([theta + dt * dtheta_dt, omega + dt * d2theta_dt2])

def nl_mpc_cost(u_seq, y0):
    cost = 0.0
    y = y0.copy()
    for i in range(N):
        u = u_seq[i]
        y = nonlinear_dynamics(y, u)
        cost += (y - y_ref) @ Q @ (y - y_ref) + R * u**2
    return cost

# Simulation
T = 200  # Time steps (longer for stabilization)
y_traj = np.zeros((2, T+1))
u_traj = np.zeros(T)
y = np.array([np.pi - 0.2, 0.0])  # Initial state: slightly off inverted
y_traj[:, 0] = y
prev_u_seq = np.zeros(N)  # For warm-start

for t in range(T):
    u_guess = prev_u_seq
    bounds = [(u_min, u_max)] * N
    res = minimize(nl_mpc_cost, u_guess, args=(y,), method='SLSQP', bounds=bounds)
    u_seq_opt = res.x
    u = u_seq_opt[0]
    y = nonlinear_dynamics(y, u)
    y_traj[:, t+1] = y
    u_traj[t] = u
    # Warm-start: shift sequence and append zero
    prev_u_seq = np.concatenate((u_seq_opt[1:], [0.0]))

# Plot
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.plot(y_traj[0], label='Theta')
plt.plot([0, T], [y_ref[0], y_ref[0]], 'r--', label='Ref')
plt.legend(); plt.title('Angle Trajectory')
plt.subplot(2, 1, 2)
plt.plot(u_traj, label='Input Torque')
plt.plot([0, T], [u_max, u_max], 'r--'); plt.plot([0, T], [u_min, u_min], 'r--')
plt.legend(); plt.title('Control Input')
plt.show()
../_images/86d1a231475df556e7087f8bc76d5da75a47bf78f1fa12abfe42d3a0daae9a08.png

(Optional) For Jacobians, use sympy to compute linearizations symbolically:

import sympy as sp

theta, omega, u = sp.symbols('theta omega u')
f = sp.Matrix([omega, - (g/l) * sp.sin(theta) + (1/(m*l**2)) * u])
A_sym = f.jacobian([theta, omega])
B_sym = f.jacobian([u])
# Lambdify for numerical eval: A = sp.lambdify((theta, omega, u), A_sym)

12.5 (Optional) Intro to Data-Driven MPC: SINDy / DMDc#

For systems without known models, use data-driven methods:

SINDy (Sparse Identification of Nonlinear Dynamics): Fits sparse nonlinear ODEs from data via regression. DMDc (Dynamic Mode Decomposition with Control): Extends DMD for controlled systems, estimating A/B matrices.

Use these to learn models for MPC (e.g., via scipy regression). Skip for now; see literature for integration with MPC.

12.6 Connection to Reinforcement Learning (Chapter 13)#

MPC is a form of model-based optimal control with finite horizons and explicit models/constraints. RL (Chapter 13) extends this to:

Infinite horizons via value/policy functions. Learning models/rewards from data (model-free vs. model-based RL, where MPC inspires planning in model-based RL like Dyna or AlphaZero). Handling uncertainty (e.g., stochastic MDPs) with safety (similar to MPC constraints via reward shaping).

In Chapter 14’s orbit station-keeping, RL agents can be viewed as “learning MPC” policies for complex, nonlinear dynamics. For further reading: “Model Predictive Control” by Rawlings et al.; connect to RL via “Reinforcement Learning: An Introduction” by Sutton & Barto.