# Gradient Descent Visualization with ggplot2

Gradient Descent is a useful method for solving optimization problems. In this post, I would like to visualize how it works.
The problem I would like to focus is linear regression and I will use mtcars dataset.

## 1. Outline

The hypothesis: $$h_\theta(x) = \theta_0+\theta_1 x$$. This is a linear model.
To fit the model, we need to define a cost function: $$J(\theta_0, \theta_1) = \frac{1}{2m}\sum\limits_1^m(h_\theta(x^i)-y^i)^2$$
For the minimum value of J, we can find the best fit.
To find the minimum, we will use gradient descent algorithm which is defined as:
$\theta_j := \theta_j-\alpha\frac{\partial}{\partial_j}J(\theta_0,\theta_1)$ This will be iterative. We will start with some random parameters and we will update until we find a minimum.

## 2. Data

In this example, wt will be the input, and mpg will be the output.

library(dplyr)
mtcars %>% select(wt, mpg) %>% tbl_df()
## # A tibble: 32 x 2
##       wt   mpg
##  * <dbl> <dbl>
##  1  2.62  21
##  2  2.88  21
##  3  2.32  22.8
##  4  3.22  21.4
##  5  3.44  18.7
##  6  3.46  18.1
##  7  3.57  14.3
##  8  3.19  24.4
##  9  3.15  22.8
## 10  3.44  19.2
## # ... with 22 more rows

A quick look:

library(ggplot2)
mtcars %>% ggplot(aes(wt, mpg))+
geom_point()

In below code chunk, I will define some initial values like learning rate, then I will run gradient descent. For every iteration, I will keep the value of theta1 and theta2 and cost in a dataframe named records.

m <- nrow(mtcars)

# variables
x1 <- rep(1, m)
x2 <- mtcars$wt y <- mtcars$mpg

# learning rate
alpha <- 0.01

# a blank dataframe to record simultaneous updates
records <- data_frame(
iter = as.integer(),
theta0 = as.numeric(),
theta1 = as.numeric(),
cost = as.numeric())

# initial theta
theta <- c(-5, 0)

for (i in 1:10000){
# predictions
yhat <- theta[1]*x1+theta[2]*x2

# cost
cost <- sum((y-yhat)^2)/(2*m)

# update theta
theta[1] <- theta[1]-alpha*(1/m)*sum((yhat-y)*x1)
theta[2] <- theta[2]-alpha*(1/m)*sum((yhat-y)*x2)

records[i, ] <- c(i, theta[1], theta[2], cost)
}

### 3.1 Parameters

After 10000 iteration, here are the parameters:

theta
## [1] 37.264942 -5.338675

# visualize regression line
mtcars %>% ggplot(aes(wt, mpg))+
geom_point()+
geom_abline(intercept = theta[1],
slope = theta[2])

Visualize cost:

# visualize cost
records %>% ggplot(aes(iter, cost))+
geom_point(size = 0.4)

# visualize updates in parameters
grid <- data_frame(theta0 = seq(-10, 50, length.out = 100),
theta1 = list(seq(-10, 10, length.out = 100)),
cost = NA) %>%
tidyr::unnest()

for (i in 1:nrow(grid)){
theta0 <- grid$theta0[i] theta1 <- grid$theta1[i]
yhat <- theta0*x1+theta1*x2
grid\$cost[i] <- sum((y-yhat)^2)/(2*m)
}

grid %>% ggplot(aes(x = theta0, y = theta1, z = cost))+
geom_raster(aes(fill = cost)) +
geom_point(data = records, aes(x = theta0, y = theta1), color = "white", alpha = 0.1)+
scale_fill_gradient(low = "#56B1F7", high = "#132B43", guide = "colourbar")

Starting from some other parameters:
Set $$(\theta_0, \theta_1) = (20, 8)$$ and run the gradient descent again: