Visualizing Deep Learning

by Zach Geis
I personally find Deep Learning and Neural Networks fascinating. The goal of this visualization is to demonstrate the actual operations behind a Neural Network.
As you adjust the iteration slider, the page will display every operation that goes into executing a Neural Network.

This network takes an input of 3 numbers. Each number is either 1 or 0. The goal of the network is to find a pattern between the first and second number being 1 and 0 in either order, but not both 0 or 1. (Exclusive OR)
For example, 0-0-0, 1-1-0, 0-0-1 should return 0, and 1-0-1, 0-1-1 should return 1.
The Final Error field and Error graph will display how far off the network is from its goal. The goal is to have an error close to 0.

The defaulted iteration count is 10 to prevent the page from executing too many calculations on first load. You should be able to get accurate results by setting the iteration count to 10000.

I've always found that interactive models like this help me learn the best! Hope you find this helpful as well.

Code for the network can be found here: MultiLayer.ts
For a simpler network see here: SingleLayer.ts
All of the network math is contained within the project as well and can be found here: MatrixUtil.ts

The network structure was inspired by the following blog post: Basic Python Network

Brief Function Guide:

Sigmoid:

s(x)=11+exs(x) = \frac{1}{1 + e^{-x}}

Sigmoid's First Derivative:

s(x)=ds(x)dx=s(x)(1s(x))s'(x) = \frac{ds(x)}{dx} = s(x)(1 - s(x))

Matrix Element-Wise Operations

This applies to all basic arithmetic operations
x11x12
x21x22
+
y11y12
y21y22
=
x11 + y11x12 + y12
x21 + y21x22 + y22

Matrix Multiplication

x11x12x13
x21x22x23
·
y11y12
y21y22
y31y32
=
x11*y11 + x12*y21 + x13*y31x11*y12 + x12*y22 + x13*y32
x21*y11 + x22*y21 + x23*y31x21*y12 + x22*y22 + x23*y32

Matrix Scalar

a·
x11x12
x21x22
=
a*x11a*x12
a*x21a*x22

Matrix Transpose

T(
x11x12x13
x21x22x23
)=
x11x21
x12x22
x21x23

Matrix Element-Wise Apply

f(
x11x12
x21x22
)=
f(x11)f(x12)
f(x21)f(x22)

Interactive Network:

Hidden Neuron Count:
Learning Rate:
Iteration Count:
Hidden Neuron Count: 4
Learning Rate: 0.1
Iteration Count: 10
Final Error: 0.5067021977843195
Iteration (of 10): 10
Forward Pass:
L1 Weighted Sum
-0.087560.819148-0.914160.092082
0.4866040.266100-1.720650.046920
-0.346820.766748-0.459940.965229
0.2273470.213700-1.266440.920067
=
Inputs
001
011
101
111
·
L1 Weights
-0.25925-0.052400.4542170.873146
0.574170-0.55304-0.80649-0.04516
-0.087560.819148-0.914160.092082
L1 Output
0.4781220.6940550.2861480.523004
0.6193060.5661350.1517860.511727
0.4141530.6828170.3869980.724167
0.5565930.5532220.2198660.715055
= S(
L1 Weighted Sum
-0.087560.819148-0.914160.092082
0.4866040.266100-1.720650.046920
-0.346820.766748-0.459940.965229
0.2273470.213700-1.266440.920067
)
L2 Weighted Sum
0.065925
0.196956
-0.25857
-0.10310
=
Inputs (L1 Output)
0.4781220.6940550.2861480.523004
0.6193060.5661350.1517860.511727
0.4141530.6828170.3869980.724167
0.5565930.5532220.2198660.715055
·
L2 Weights
0.685648
0.673746
-0.81662
-0.94806
L2 Output (Final Result)
0.516475
0.549080
0.435713
0.474246
= S(
L2 Weighted Sum
0.065925
0.196956
-0.25857
-0.10310
)
Back Propagation Pass:
L2 Error
-0.51647
0.450919
0.564286
-0.47424
=
Target Output
0
1
1
0
-
L2 Output
0.516475
0.549080
0.435713
0.474246
L2 Gradients
0.249728
0.247591
0.245867
0.249336
= S'(
L2 Weighted Sum
0.065925
0.196956
-0.25857
-0.10310
)
L2 Delta
-0.12897
0.111643
0.138739
-0.11824
=
L2 Gradients
0.249728
0.247591
0.245867
0.249336
x
L2 Error
-0.51647
0.450919
0.564286
-0.47424
L1 Error
-0.08843-0.086890.1053270.122279
0.0765480.075219-0.09117-0.10584
0.0951260.093475-0.11329-0.13153
-0.08107-0.079660.0965630.112105
=
L2 Delta
-0.12897
0.111643
0.138739
-0.11824
·T(
L2 Weights
0.685648
0.673746
-0.81662
-0.94806
)
L1 Gradients
0.2495210.2123420.2042670.249470
0.2357660.2456260.1287470.249862
0.2426300.2165770.2372300.199748
0.2467970.2471670.1715250.203750
= S'(
L1 Hidden Sum
-0.087560.819148-0.914160.092082
0.4866040.266100-1.720650.046920
-0.346820.766748-0.459940.965229
0.2273470.213700-1.266440.920067
)
L1 Delta
-0.02206-0.018450.0215140.030505
0.0180470.018475-0.01173-0.02644
0.0230800.020244-0.02687-0.02627
-0.02000-0.019690.0165630.022841
=
L1 Gradients
0.2495210.2123420.2042670.249470
0.2357660.2456260.1287470.249862
0.2426300.2165770.2372300.199748
0.2467970.2471670.1715250.203750
x
L1 Error
-0.08843-0.086890.1053270.122279
0.0765480.075219-0.09117-0.10584
0.0951260.093475-0.11329-0.13153
-0.08107-0.079660.0965630.112105
L1 Weight Updates
0.0003070.000055-0.00103-0.00034
-0.00019-0.000120.000482-0.00036
-0.000090.000057-0.000050.000062
= 0.1 · (T(
Inputs
001
011
101
111
L1 Delta
-0.02206-0.018450.0215140.030505
0.0180470.018475-0.01173-0.02644
0.0230800.020244-0.02687-0.02627
-0.02000-0.019690.0165630.022841
)
L2 Weight Updates
-0.00008
0.000300
0.000773
0.000559
= 0.1 · (T(
L1 Output
0.4781220.6940550.2861480.523004
0.6193060.5661350.1517860.511727
0.4141530.6828170.3869980.724167
0.5565930.5532220.2198660.715055
L2 Delta
-0.12897
0.111643
0.138739
-0.11824
)
New L1 Weights
-0.25894-0.052340.4531850.872803
0.573974-0.55316-0.80601-0.04552
-0.087660.819205-0.914210.092145
=
L1 Delta
-0.25925-0.052400.4542170.873146
0.574170-0.55304-0.80649-0.04516
-0.087560.819148-0.914160.092082
+
L1 Weight Updates
0.0003070.000055-0.00103-0.00034
-0.00019-0.000120.000482-0.00036
-0.000090.000057-0.000050.000062
New L2 Weights
0.685560
0.674046
-0.81585
-0.94750
=
L2 Delta
0.685648
0.673746
-0.81662
-0.94806
+
L2 Weight Updates
-0.00008
0.000300
0.000773
0.000559