์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์„ฑํ™” ํ•ด์ฃผ์„ธ์š”

Multi-Layer Perceptron

[TIL] ์˜์นด X ๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ (AI ์—”์ง€๋‹ˆ์–ด ์œก์„ฑ ๋ถ€ํŠธ ์บ ํ”„ 2๊ธฐ) 4์ฃผ์ฐจ

 ·  โ˜• 3 min read

๋“ค์–ด๊ฐ€๋ฉฐ


ย ย ย 4์ฃผ์ฐจ ๊ฐ•์˜์— ์ ‘์–ด๋“ค์—ˆ๋‹ค. ์ด์ œ๊นŒ์ง€๋Š” ์ฃผ๋กœ Machine Learning ์˜ ๋Œ€ํ•œ ๋‚ด์šฉ๋“ค์— ๋Œ€ํ•ด ๋‹ค๋ฃจ์—ˆ๋‹ค. ๊ฐ•์˜์˜ ์ง„๋„๊ฐ€ ๋งค์šฐ ๋น ๋ฅด๋‹ค. TIL๊ณผ ์‹ค์Šต๊ณผ์ œ๋“ค์„ ํ•ด๋ณด๋ฉด์„œ ๊ฐ์„ ์žก๊ณ ์žˆ๊ธดํ•œ๋ฐ ๋ฒ…์ฐฌ ๊ฒƒ ๊ฐ™๋‹ค. ํ”ผ์–ด๊ทธ๋ฃน์„ ํ•˜๋ฉด์„œ ์„œ๋กœ ์ง„๋„์ฒดํฌ ํ•˜๋Š”๊ฒƒ์ด ๋„์›€์ด ๋˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ถ€ํŠธ ์บ ํ”„ ์ „ ๊ธฐ์ˆ˜์ค‘์— AI ์—”์ง€๋‹ˆ์–ด๋กœ ์ปค๋ฆฌ์–ด ์ „ํ™˜ํ•˜์‹  ๋ถ„์˜ 1์‹œ๊ฐ„ ์ •๋„ ๊ฐ•์—ฐ์„ ํ•ด์ฃผ์…จ๋Š”๋ฐ, ๊ทธ๋ถ„๋„ ์ฒ˜์Œ์—๋Š” ๋ฉ˜๋ถ•์ƒํƒœ์˜€๋‹ค๊ณ  ๊ทธ๋žฌ๋‹ค. ํ•˜์ง€๋งŒ ์ดํ•ด ์•ˆ๋˜๋Š” ๊ฒƒ์— ๋„ˆ๋ฌด ์ง‘์ฐฉํ•˜์ง€ ๋ง๊ณ  ์ตœ๋Œ€ํ•œ ๊ฐ•์˜์™€ ๊ณผ์ œ๋ฅผ ๋”ฐ๋ผ๊ฐ€ ๊ฒฐ๊ตญ์—” ์ตœ์šฐ์ˆ˜ ์ˆ˜๊ฐ•์ƒ์ด ๋˜์…จ๋‹ค๊ณ  ํ•˜์˜€๋‹ค. ๊ทธ๋ ‡๊ฒŒ ์œ„๋กœ์•„๋‹Œ ์œ„๋กœ๋ฅผ ๋ฐ›๊ณ  ์ด์ œ๋ถ€ํ„ฐ๋Š” Deep Learning ์— ๋Œ€ํ•œ ๊ฐ•์˜๊ฐ€ ์‹œ์ž‘๋˜๋Š”๋ฐ ๋‹ค์‹œ ํ•œ๋ฒˆ ํž˜์„ ๋‚ด์•ผ๊ฒ ๋‹ค.

4์ฃผ์ฐจ


  1. Multi-Layer Perception
  2. Deep Learning
  3. Forward Pass
  4. Activation Function
  5. Loss Fucntion
  6. (์‹ค์Šต) Pytorch Tutorial
  7. (์‹ค์Šต) MLP MNIST Classfication

Multi-Layer Perception


  • Perceptron?

    • ๋‹ค์ฐจ์› ์ž…๋ ฅ ๋ฒกํ„ฐ์— ๊ฐ€์ค‘์น˜(w)๋ฅผ ๊ณฑํ•ด ์ถœ๋ ฅ ๊ฐ’์„ ์–ป๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜, ๋ฒกํ„ฐ๊ฐ„์˜ ๋‚ด์  + bias
  • ์„ ํ˜• ๋ชจ๋ธ์˜ ํ™•์žฅ, ๊ธฐ์กด ์„ ํ˜•๋ชจ๋ธ์ด ํ•ด๊ฒฐํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ(XOR gate problem)

  • MLP ๊ตฌ์กฐ

    • Parameters: Weight, Bias
    • Activation Function: input๊ณผ output ๊ด€๊ณ„์—์„œ non-linearity(๋น„์„ ํ˜•์„ฑ)์„ ์ค€๋‹ค. -> ์ข€ ๋” ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ
    • Loss Function
  • MLP ๋™์ž‘ ๋ฐฉ์‹: Forward Pass(Parmeters ์™€ activation function์„ ์ด์šฉ) -> get Loss -> Backward Pass

  • MLP ์˜ layer๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด Parameter ์ˆ˜๊ฐ€ ์—„์ฒญ๋‚˜๊ฒŒ ๋Š˜์–ด๋‚˜๊ฒŒ ๋จ

Forward Pass


  • ์ž…๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ parameter์™€ activation function์„ ํ†ตํ•ด ์ถœ๋ ฅ์€ ์ถ”๋ก ํ•˜๋Š” ๊ณผ์ •

  • Batch Training: ํ•™์Šต์ด๋‚˜ ์ถ”๋ก ์„ ํ• ๋•Œ, ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌถ์–ด์„œ ์ง„ํ–‰

    • ๋ฐ์ดํ„ฐ์˜ ๋ฌถ์Œ์„ batch๋ผ ํ•œ๋‹ค.
    • ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ํ•™์Šต์„ ํ•˜๊ฒŒ๋˜๋ฉด ์ข€ ๋” ํšจ์œจ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ๋†’์ผ์ˆ˜์žˆ๋‹ค.
    • ํ–‰๋ ฌ๊ณผ ํ–‰๋ ฌ์˜ ๊ณฑ์œผ๋กœ ํ™•์žฅ
  • Matrix Multiplication(ํ–‰๋ ฌ๊ณฑ)?

  • Mini Batch Training:

    • ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•˜๋Š”๊ฒƒ์€ ๋ฉ”๋ชจ๋ฆฌ๋‚˜ over fitting์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ
    • ํšจ์œจ์ ์ธ ํ•™์Šต์„ ์œ„ํ•ด random ํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ sampling
    • 1 epoch => ์—ฌ๋Ÿฌ๊ฐœ์˜ mini-batch๋ฅผ ํ†ตํ•œ ํ•™์Šต
    • epoch์„ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ธ๋‹ค.

Activation Function


  • Why? ๊ฐ layer์˜ ์—ฐ์‚ฐ์€ ์„ ํ˜• ์—ฐ์‚ฐ, ๋น„์„ ํ˜• ์„ฑ์งˆ์„ ๊ฐ€์ง„ activation function์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ๋” ๋‹ค์–‘ํ•œ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์ง€๋„๋ก ํ•จ
  • Activation Function ์ข…๋ฅ˜
    • Sigmoid: ๋ฏธ๋ถ„ ๊ณ„์ˆ˜?์˜ ๊ณ„์‚ฐ์ด ๊ฐ„๋‹จ, ๋ฏธ๋ถ„ ๊ณ„์ˆ˜์˜ ๊ฐ’์ด 0์ด ๋˜๋Š” ์˜์—ญ์ด ๋„ˆ๋ฌด ๋„“์Œ, Vanishing Gradient Problem
    • tanh: Vanishing Gradient Problem
    • ReLu: input value์˜ max operation(max(0,x)), ๊ณ„์‚ฐ์ด ๊ต‰์žฅํžˆ ๋น ๋ฆ„, ์ฃผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋จ
    • Leaky-Relu: ReLu์˜ ๊ฐ’์ด ์Œ์ˆ˜์—์„œ ์‚ฌ๋ผ์ง€๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ
  • Softmax Function
    • ๋ชจ๋ธ output ๋ถ€๋ถ„์—์„œ ์‚ฌ์šฉ
    • ๋ชจ๋ธ์ด ์–ด๋–ค class๋กœ ์ถ”์ •ํ–ˆ๋Š”์ง€ ์ดํ•ดํ•˜๋Š”๋ฐ ๋„์›€์„ ์คŒ

Loss Function


  • ๋ชจ๋ธ์˜ output์ด ์–ผ๋งˆ๋‚˜ ํ‹€๋ ธ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„, ๋‚˜์ค‘์— backward pass์—์„œ ๋ชจ๋ธ์˜ parameter๋ฅผ ์ˆ˜์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ
  • ์ฃผ๋กœ regression task๋Š” MSE loss function์„ ์‚ฌ์šฉ
  • ์ฃผ๋กœ classification task๋Š” cross-entropy loss function์„ ์‚ฌ์šฉ

shin alli
๊ธ€์“ด์ด
shin alli
Backend ๊ฐœ๋ฐœ์ž (Python, Django, AWS)