File size: 6,587 Bytes
d157f08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
1
00:00:01,080 --> 00:00:08,280
And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far

2
00:00:08,490 --> 00:00:15,240
we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all

3
00:00:15,420 --> 00:00:22,660
those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a

4
00:00:22,660 --> 00:00:27,170
simple process where we reduce the size or dimensionality of the future map.

5
00:00:27,280 --> 00:00:31,690
The purpose of this reductionists reduced number of parameters that we need to train whilst retaining

6
00:00:31,690 --> 00:00:36,670
most of the important features and information in the image.

7
00:00:36,870 --> 00:00:39,100
They are basically tree types of pooling we can apply.

8
00:00:39,100 --> 00:00:43,800
There are actually some Wolper does take a look at these Tree Man types that are used.

9
00:00:43,870 --> 00:00:46,250
So here's an example of Max pooling.

10
00:00:46,300 --> 00:00:52,900
Imagine this is the really outputs from all this input output here was reproduced from the real real

11
00:00:52,940 --> 00:00:53,510
layer.

12
00:00:53,800 --> 00:00:57,430
So you can imagine these values at the zeros here were actually negative values.

13
00:00:57,820 --> 00:01:03,790
So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just

14
00:01:03,790 --> 00:01:09,520
like we did with the straight and of the kernels we used in the convolutional Liya and basically using

15
00:01:09,520 --> 00:01:10,530
a two by two.

16
00:01:10,600 --> 00:01:15,250
It splits up into two by two two by two two by two by two grid.

17
00:01:15,580 --> 00:01:24,190
So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them

18
00:01:24,190 --> 00:01:25,380
into this block here.

19
00:01:25,750 --> 00:01:29,270
So this is what we call downsampling or subsampling.

20
00:01:29,320 --> 00:01:35,440
Basically we have sort of like compressed the image here and retain the most Max important features

21
00:01:36,680 --> 00:01:37,470
actually.

22
00:01:37,470 --> 00:01:40,160
Let's go back to the previous slide and previously.

23
00:01:40,210 --> 00:01:42,810
We mentioned average and sampling.

24
00:01:42,850 --> 00:01:48,850
Now as you can imagine average and sampling would just simply be the average of these values here here

25
00:01:49,120 --> 00:01:53,130
here here and sampling would just be the sum of these values.

26
00:01:53,460 --> 00:01:55,090
So it's also a way we can use pooling.

27
00:01:55,090 --> 00:02:01,900
However in majority of convolutional neural nets we always use maximally.

28
00:02:01,940 --> 00:02:04,740
So this is only so far just to do a recap.

29
00:02:04,880 --> 00:02:10,370
We have an input image with our key and all that is being slid across this image producing multiple

30
00:02:10,370 --> 00:02:11,380
different filters here.

31
00:02:11,450 --> 00:02:15,920
All of it seems much of the same size as the input image and that's because of zero padding.

32
00:02:16,250 --> 00:02:22,430
Then we have a real output which basically is the same size up of matrix as this except all the negative

33
00:02:22,430 --> 00:02:23,850
values into zeros.

34
00:02:24,230 --> 00:02:30,470
And then we have the subsampling are pulling away a lot downsampling which basically reduces this image.

35
00:02:30,530 --> 00:02:37,220
This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four

36
00:02:37,360 --> 00:02:41,570
and we get a two by two and that's still 12 filters.

37
00:02:41,750 --> 00:02:44,540
However they have not been downsampled.

38
00:02:44,540 --> 00:02:45,880
So let's move on now.

39
00:02:46,310 --> 00:02:52,100
So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight

40
00:02:52,100 --> 00:02:54,540
of two and no padding applied.

41
00:02:54,560 --> 00:02:58,280
That's how we actually get this four by four here.

42
00:02:58,280 --> 00:03:01,920
It takes a two by two jump to make two jump and blah blah blah.

43
00:03:04,060 --> 00:03:08,170
So for smaller and put images or larger images we can use larger pools.

44
00:03:09,020 --> 00:03:14,530
Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing

45
00:03:14,530 --> 00:03:16,890
dimensionality width and height.

46
00:03:16,930 --> 00:03:18,330
Those are the only two dimensions we have.

47
00:03:18,340 --> 00:03:22,150
We reduce the of the previous layer by half.

48
00:03:22,330 --> 00:03:26,950
And to us removing tree quarter or 75 percent of the activations seen in the previously

49
00:03:31,290 --> 00:03:32,940
so keep moving on.

50
00:03:32,940 --> 00:03:39,470
This makes our model more invariant to small or minor transformations or distortions no input image.

51
00:03:39,570 --> 00:03:45,000
Since we're now averaging or taking to max or put from a small area of an image what this actually means

52
00:03:45,000 --> 00:03:51,020
is that we're instead of looking at specific pixels here in an image because we're actually dwindling

53
00:03:51,050 --> 00:03:57,480
sample and looking at a max in an area we sort of add some sort of variance or spatial variance too

54
00:03:57,480 --> 00:03:58,150
awful to say.

55
00:03:58,170 --> 00:04:04,800
So if filters on super specific to certain areas and I remember they being slid across the image.

56
00:04:04,800 --> 00:04:10,410
So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually

57
00:04:11,310 --> 00:04:13,160
add some invariants Now to it.

58
00:04:13,170 --> 00:04:20,490
So this actually increases do basically the ability of all convolutional model to generalize to information

59
00:04:20,490 --> 00:04:21,790
is never seen before.

60
00:04:23,540 --> 00:04:29,450
So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them

61
00:04:29,450 --> 00:04:30,250
later on.

62
00:04:30,410 --> 00:04:35,990
But for now seeing is of course to CNN and this is the last layer to fully connected.

63
00:04:36,030 --> 00:04:36,730
FCPA.