Numpy vs. Python Lists

<2018-02-05 Mon>

While I was learning Machine Learning, I made my mind not to use any of the third-party-libraries (including numpy and Pandas) until I got to know what happened in the background and feel okay to use libraries.

It was a bit harder to write everything from scratch. Even the dot product must be written, cause python doesn't have that inbuilt.

Finally, I've felt comfortable and could clearly understand what is happening in the background (cause, Machine Learning is all about Math).

Implementing Stochastic Gradient Descent

Stochastic gradient descent is actually scanning through the training examples, and then it'll take a little gradient descent step with respect to the cost function of just that training example.

The cost function measures how well the hypothesis is doing on a single example.

\[{cost( \Theta, (x^{(i)}, y^{(i)})) = \dfrac{1}{2}(h_\Theta(x^{(i)}) - y^{(i)})^2}\]

\[{J_{train}(\Theta) = \dfrac{1}{m} \sum_{i=1}^{m} cost( \Theta, (x^{(i)}, y^{(i)}))}\]

A dumb Stochastic Gradient Descent Implementation

Using Lists:

def dot(v, w):
    return sum(v_i * w_i for v_i, w_i in zip(v, w))

def predict_multi(x_i, beta):
    return dot(x_i, beta)

def error_multi(beta, x, y):
    return y - predict_multi(x, beta)

def stochastic_gradient_descent(x, y, theta, learning_rate=0.001):
    for _ in range(1500):
        for x_i, y_i in zip(x, y):
            i = 0
            for x_ii, theta_i in zip(x_i, theta):
                theta[i] = theta_i - learning_rate * (-error_multi(theta, x_i, y_i)) * x_ii
                i += 1
    return theta

So, the benchmark results

%timeit stochastic_gradient_descent(multi_data, daily_minutes_good, theta)
1.85 s ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using Numpy:

import numpy as np

def stochastic_gradient_descent_np(x, y, theta, learning_rate=0.001):
    for _ in range(1500):
        for x_i, y_i in zip(x, y):
            theta = theta - learning_rate * (np.sum(x_i * theta) - y_i) * x_i
    return theta

That's it, boom. The Numpy implementation of Gradient Descent actually uses a vectorized implementation.

Benchmark using Numpy:

%timeit stochastic_gradient_descent_np(np_multi_data, np_daily, np_theta)
1.96 s ± 7.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using Numpy doesn't just make the implementation simpler but also faster.