Linear regression in PyTorch¶
Last updated: December 1, 2020
References
- https://pytorch.org/
- https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/linear_regression/main.py
- https://en.wikipedia.org/wiki/Linear_regression
- https://en.wikipedia.org/wiki/Backpropagation
- https://en.wikipedia.org/wiki/Stochastic_gradient_descent
Linear regression
Given a data set $\{y_i, x_i\}_{i=1}^n$ of $n$ statistical units, we have the following relationship:
$$ y_i = \theta_1 x_i + \theta_0 + \varepsilon_i, \qquad i=1,\ldots, n,$$where:
- $y_i$ is observed value (dependent variable
- $x_i$ is the independent variable
- $\theta_1$ and $\theta_0$ are model parameters
- $\varepsilon_i$ is the measurement error
Table of Content
1. Setup environment¶
Load packages / modules¶
In [1]:
Copied!
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim
In [2]:
Copied!
mpl.style.use('seaborn-poster')
mpl.rcParams['mathtext.fontset'] = 'cm'
mpl.rcParams['figure.figsize'] = 5 * np.array([1.618033988749895, 1])
mpl.style.use('seaborn-poster')
mpl.rcParams['mathtext.fontset'] = 'cm'
mpl.rcParams['figure.figsize'] = 5 * np.array([1.618033988749895, 1])
In [3]:
Copied!
# Reproducibility
seed = 234
np.random.seed(seed)
torch.random.manual_seed(seed);
# Reproducibility
seed = 234
np.random.seed(seed)
torch.random.manual_seed(seed);
Generate training dataset¶
In [4]:
Copied!
n = 500
theta_1 = 2.0
theta_0 = 0.4
eps_std = 0.2
n = 500
theta_1 = 2.0
theta_0 = 0.4
eps_std = 0.2
In [5]:
Copied!
x_train = np.random.rand(n,1)-0.5
eps = np.random.normal(scale=eps_std, size=(n,1))
y_train = x_train*theta_1 + theta_0 + eps
x_train = np.random.rand(n,1)-0.5
eps = np.random.normal(scale=eps_std, size=(n,1))
y_train = x_train*theta_1 + theta_0 + eps
In [6]:
Copied!
fig, ax = plt.subplots()
ax.plot(x_train, y_train, 'k.', label='observations')
ax.plot(x_train, x_train*theta_1 + theta_0, 'tab:gray', lw=5,
label=r'analytical ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(theta_1, theta_0))
ax.set(xlabel='$x$', ylabel='$y$', xlim=(-0.52,0.52), ylim=(-1.0, 1.5));
# ax.legend();
fig.legend(loc='right', bbox_to_anchor=(1.5, 0.6));
fig, ax = plt.subplots()
ax.plot(x_train, y_train, 'k.', label='observations')
ax.plot(x_train, x_train*theta_1 + theta_0, 'tab:gray', lw=5,
label=r'analytical ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(theta_1, theta_0))
ax.set(xlabel='$x$', ylabel='$y$', xlim=(-0.52,0.52), ylim=(-1.0, 1.5));
# ax.legend();
fig.legend(loc='right', bbox_to_anchor=(1.5, 0.6));
2. Define the model, loss function and optimizer¶
Hyperparameters¶
In [7]:
Copied!
n_features = 1 # number of nodes (1 weight + 1 bias)
num_epochs = 150 # i.e. number of iterations
learning_rate = 0.2
n_features = 1 # number of nodes (1 weight + 1 bias)
num_epochs = 150 # i.e. number of iterations
learning_rate = 0.2
Model¶
$$ \mathcal{F}(x;\ \theta): x \rightarrow y $$Linear 1-D model:
$$ \mathcal{F}(x;\ \theta) = w x + b = \hat{y}$$where:
- $y$ is the ground truth
- $\hat{y}$ is the predicted output
- $\theta$ are the trainable parameters, with
- $w$ is the learnable weight, i.e. $w \equiv \theta_1$
- $b$ is the learnable bias, i.e. $b \equiv \theta_0$
In [8]:
Copied!
# Linear regression model
model = nn.Linear(in_features=n_features, out_features=n_features)
model
# Linear regression model
model = nn.Linear(in_features=n_features, out_features=n_features)
model
Out[8]:
Linear(in_features=1, out_features=1, bias=True)
In [9]:
Copied!
for i, (name, param) in enumerate(model.named_parameters()):
print(r"{:6s} (theta_{}): {:.4f}".format(name, 1-i, param.item()))
for i, (name, param) in enumerate(model.named_parameters()):
print(r"{:6s} (theta_{}): {:.4f}".format(name, 1-i, param.item()))
weight (theta_1): 0.8553 bias (theta_0): 0.1633
Loss function¶
$L^1$-norm:
$$ \mathcal{L}(\hat{Y}, Y;\ \theta) = \textrm{mean} \left( \{l_1,\dots,l_N\}^\top \right), \quad l_n = \left| \hat{y}_n - y_n \right| $$where:
- $X \in \mathbb{R}^N$ is the input matrix
- $Y \in \mathbb{R}^N$ is the ground truth matrix
- $N$ is the batch size (for now, batch size = number of examples)
In [10]:
Copied!
# Loss and optimizer
criterion = nn.L1Loss() # L^1-norm, aka. mean absolute error loss
criterion
# Loss and optimizer
criterion = nn.L1Loss() # L^1-norm, aka. mean absolute error loss
criterion
Out[10]:
L1Loss()
Optimizer¶
Stochastic gradient descent
$$ \theta^{n+1} = \theta^n - \eta \nabla \mathcal{L}(\theta^n) $$where:
- $\eta$ is the learning rate (i.e. step size)
In [11]:
Copied!
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # Stochastic gradient-descent
optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # Stochastic gradient-descent
optimizer
Out[11]:
SGD ( Parameter Group 0 dampening: 0 lr: 0.2 momentum: 0 nesterov: False weight_decay: 0 )
3. Train the model¶
$$ \mathcal{F}^* = \arg \min_{\mathcal{F}}\ \mathcal{L}(\hat{Y}, Y;\ \theta)$$Algorithm:
- Convert data to torch tensor:
torch.from_numpy(...)
- Forward pass:
y_hat = model(x)
- Calculate loss:
loss = criterion(y_hat, y)
- Compute gradients:
loss.backward()
- Update weights:
optimizer.step()
In [12]:
Copied!
history = dict(loss=[], parameters=[], grads=[])
for epoch in range(num_epochs):
# 1. Convert numpy arrays to torch tensors
x = torch.from_numpy(x_train.astype('float32'))
y = torch.from_numpy(y_train.astype('float32'))
# 2. Forward-pass:
y_hat = model(x) # prediction step
# 3. Calculate loss
loss = criterion(y_hat, y)
# 4. Backward propagation
optimizer.zero_grad() # reset gradients to zero
loss.backward() # backprop: calculate gradients w.r.t to loss
# 5. Update weights
optimizer.step() # update gradient
# 6. Log
history['loss'].append(loss.item())
history['parameters'].append([param.detach().item() for param in model.parameters()])
history['grads'].append([param.grad.detach().item() for param in model.parameters()])
if (epoch % 10) == 0 or epoch==(num_epochs-1):
print('[Epoch {:3d}]: loss={:.4f}, w={:.2f}, b={:.2f}'.format(
epoch, history['loss'][-1], *history['parameters'][-1]))
history = dict(loss=[], parameters=[], grads=[])
for epoch in range(num_epochs):
# 1. Convert numpy arrays to torch tensors
x = torch.from_numpy(x_train.astype('float32'))
y = torch.from_numpy(y_train.astype('float32'))
# 2. Forward-pass:
y_hat = model(x) # prediction step
# 3. Calculate loss
loss = criterion(y_hat, y)
# 4. Backward propagation
optimizer.zero_grad() # reset gradients to zero
loss.backward() # backprop: calculate gradients w.r.t to loss
# 5. Update weights
optimizer.step() # update gradient
# 6. Log
history['loss'].append(loss.item())
history['parameters'].append([param.detach().item() for param in model.parameters()])
history['grads'].append([param.grad.detach().item() for param in model.parameters()])
if (epoch % 10) == 0 or epoch==(num_epochs-1):
print('[Epoch {:3d}]: loss={:.4f}, w={:.2f}, b={:.2f}'.format(
epoch, history['loss'][-1], *history['parameters'][-1]))
[Epoch 0]: loss=0.3869, w=0.89, b=0.26 [Epoch 10]: loss=0.2384, w=1.31, b=0.42 [Epoch 20]: loss=0.1822, w=1.63, b=0.40 [Epoch 30]: loss=0.1639, w=1.81, b=0.40 [Epoch 40]: loss=0.1591, w=1.90, b=0.40 [Epoch 50]: loss=0.1579, w=1.94, b=0.40 [Epoch 60]: loss=0.1575, w=1.97, b=0.40 [Epoch 70]: loss=0.1573, w=1.99, b=0.40 [Epoch 80]: loss=0.1572, w=2.00, b=0.40 [Epoch 90]: loss=0.1572, w=2.01, b=0.40 [Epoch 100]: loss=0.1571, w=2.02, b=0.40 [Epoch 110]: loss=0.1571, w=2.03, b=0.40 [Epoch 120]: loss=0.1571, w=2.03, b=0.40 [Epoch 130]: loss=0.1571, w=2.04, b=0.40 [Epoch 140]: loss=0.1571, w=2.04, b=0.40 [Epoch 149]: loss=0.1571, w=2.04, b=0.40
Training loss history¶
In [13]:
Copied!
fig, axes = plt.subplots(1, 2, figsize=(mpl.rcParams['figure.figsize'][0]*2, mpl.rcParams['figure.figsize'][1]))
axes[0].plot(history['loss'])
axes[0].set(xlabel='epoch', ylabel='$L^1$ loss',
title='Training loss history');
axes[1].plot(history['grads'])
axes[1].set(xlabel='epoch', ylabel='gradients',
title='Training gradient history');
axes[1].legend([r'$\nabla w \equiv \nabla \theta_1$',
r'$\nabla b \equiv \nabla \theta_0$'])
fig, axes = plt.subplots(1, 2, figsize=(mpl.rcParams['figure.figsize'][0]*2, mpl.rcParams['figure.figsize'][1]))
axes[0].plot(history['loss'])
axes[0].set(xlabel='epoch', ylabel='$L^1$ loss',
title='Training loss history');
axes[1].plot(history['grads'])
axes[1].set(xlabel='epoch', ylabel='gradients',
title='Training gradient history');
axes[1].legend([r'$\nabla w \equiv \nabla \theta_1$',
r'$\nabla b \equiv \nabla \theta_0$'])
Out[13]:
<matplotlib.legend.Legend at 0x7f89b09c0ac0>
Optimization over the loss landscape¶
In [14]:
Copied!
# L^1 norm: numpy version
l1_norm = lambda y_hat, y: np.abs(y_hat - y).mean(axis=0)
# Loss landscape
theta_1_h, theta_0_h = np.meshgrid(np.linspace(0, 4),
np.linspace(-0.2, 0.8))
loss_surface = l1_norm(x_train[...,None]*theta_1_h[None] + theta_0_h[None],
y_train[...,None])
# L^1 norm: numpy version
l1_norm = lambda y_hat, y: np.abs(y_hat - y).mean(axis=0)
# Loss landscape
theta_1_h, theta_0_h = np.meshgrid(np.linspace(0, 4),
np.linspace(-0.2, 0.8))
loss_surface = l1_norm(x_train[...,None]*theta_1_h[None] + theta_0_h[None],
y_train[...,None])
In [15]:
Copied!
fig, ax = plt.subplots()
im = ax.contourf(theta_1_h, theta_0_h, loss_surface)
ax.plot(*history['parameters'][0], 'c.', ms=15, label='start', zorder=10)
ax.plot(theta_1, theta_0, 'r.', ms=20, label='optima', zorder=10)
ax.plot(np.array(history['parameters'])[:,0],
np.array(history['parameters'])[:,1],
'.-', c='k', label='path')
ax.set(xlabel=r'$\theta_1$ ($w$, weight)',
ylabel=r'$\theta_0$ ($b$, bias)',
title='$L^1$ loss landscape')
fig.colorbar(im, ax=ax)
fig.legend(loc='right', bbox_to_anchor=(1.15, 0.7));
fig, ax = plt.subplots()
im = ax.contourf(theta_1_h, theta_0_h, loss_surface)
ax.plot(*history['parameters'][0], 'c.', ms=15, label='start', zorder=10)
ax.plot(theta_1, theta_0, 'r.', ms=20, label='optima', zorder=10)
ax.plot(np.array(history['parameters'])[:,0],
np.array(history['parameters'])[:,1],
'.-', c='k', label='path')
ax.set(xlabel=r'$\theta_1$ ($w$, weight)',
ylabel=r'$\theta_0$ ($b$, bias)',
title='$L^1$ loss landscape')
fig.colorbar(im, ax=ax)
fig.legend(loc='right', bbox_to_anchor=(1.15, 0.7));
In [16]:
Copied!
# Predict
predict = model(torch.from_numpy(x_train.astype('float32'))).detach().numpy()
# Plot the graph
fig, ax = plt.subplots()
ax.plot(x_train, y_train, 'k.', label='observations')
ax.plot(x_train, x_train*theta_1 + theta_0, 'tab:gray', lw=5,
label=r'analytical: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(theta_1, theta_0))
ax.plot(x_train, x_train*history['parameters'][0][0] + history['parameters'][0][1], 'tab:blue',
label=r'prediction [epoch 0]: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(*history['parameters'][0]))
ax.plot(x_train, predict, 'tab:red',
label=r'prediction [trained]: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(*history['parameters'][-1]))
ax.set(xlabel='$x$', ylabel='$y$',
title='Training accuracy',
xlim=(-0.52,0.52), ylim=(-1.0, 1.5))
fig.legend(loc='right', bbox_to_anchor=(1.6, 0.6));
# Predict
predict = model(torch.from_numpy(x_train.astype('float32'))).detach().numpy()
# Plot the graph
fig, ax = plt.subplots()
ax.plot(x_train, y_train, 'k.', label='observations')
ax.plot(x_train, x_train*theta_1 + theta_0, 'tab:gray', lw=5,
label=r'analytical: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(theta_1, theta_0))
ax.plot(x_train, x_train*history['parameters'][0][0] + history['parameters'][0][1], 'tab:blue',
label=r'prediction [epoch 0]: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(*history['parameters'][0]))
ax.plot(x_train, predict, 'tab:red',
label=r'prediction [trained]: ($\theta_1={:.2f}$, $\theta_0={:.2f}$)'.format(*history['parameters'][-1]))
ax.set(xlabel='$x$', ylabel='$y$',
title='Training accuracy',
xlim=(-0.52,0.52), ylim=(-1.0, 1.5))
fig.legend(loc='right', bbox_to_anchor=(1.6, 0.6));
5. Save the trained model¶
In [17]:
Copied!
# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')
# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')
6. Load pretrained model¶
In [18]:
Copied!
checkpoint = torch.load('model.ckpt')
checkpoint
checkpoint = torch.load('model.ckpt')
checkpoint
Out[18]:
OrderedDict([('weight', tensor([[2.0430]])), ('bias', tensor([0.4025]))])
In [19]:
Copied!
model.load_state_dict(checkpoint)
model.load_state_dict(checkpoint)
Out[19]:
<All keys matched successfully>
Last update:
March 9, 2022