์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์„ฑํ™” ํ•ด์ฃผ์„ธ์š”

CNN, Autoencoder, Upsampling, Semantic Segmentation

[TIL] ์˜์นด X ๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ (AI ์—”์ง€๋‹ˆ์–ด ์œก์„ฑ ๋ถ€ํŠธ ์บ ํ”„ 2๊ธฐ) 7์ฃผ์ฐจ

 ·  โ˜• 5 min read

๋“ค์–ด๊ฐ€๋ฉฐ


ย ย ย 7์ฃผ์ฐจ ๊ฐ•์˜์— ์ ‘์–ด๋“ค์—ˆ๋‹ค. ์ด๋ฒˆ ๊ฐ•์˜์—์„œ๋Š” ๋Œ€ํ‘œ์ ์ธ CNN ๋ชจ๋ธ๋“ค๊ณผ ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ์ž‘๋™๋ฐฉ์‹(Residual Block, Bottleneck Block, Autoencoder…)์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฏธ์ง€ Object detection๊ณผ Semantic Segmentation์— ๋Œ€ํ•ด์„œ ๋ฐฐ์› ๋‹ค. ๋งŽ์€ ๋‚ด์šฉ๋“ค์ด ํ•œ ๊ฐ•์˜์— ๋‹ด๊ฒจ์žˆ๋‹ค๋ณด๋‹ˆ, ์„ค๋ช…๋“ค์ด ์นœ์ ˆํ•˜์ง€ ์•Š์•„ ๊ฐœ์ธ์ ์ธ ๊ณต๋ถ€๋Š” ํ•„์ˆ˜์ธ ๊ฒƒ ๊ฐ™๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด์ œ ๋‹ค์Œ 8์ฃผ์ฐจ ๊ฐ•์˜๋ฅผ ๋งˆ์ง€๋ง‰์œผ๋กœ ํ•ด์ปคํ†ค ํŒ€๊ตฌ์„ฑ์ด ์‹œ์ž‘๋œ๋‹ค. 4~5๋ช…์œผ๋กœ ํŒ€์„ ์ด๋ค„ AI ๊ด€๋ จ ์ฃผ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ , ๋งˆ์ง€๋ง‰ ๋ฐœํ‘œํšŒ๋ฅผ ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ€ํŠธ ์บ ํ”„๋Š” ์ข…๋ฃŒ๋œ๋‹ค. ํ•ด์ปคํ†ค ํŒ€์— ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ•์˜๋ฅผ ์ž˜ ๋งˆ๋ฌด๋ฆฌ ํ•ด์•ผ๊ฒ ๋‹ค.

7์ฃผ์ฐจ


  1. ๋Œ€ํ‘œ์ ์ธ CNN ๋ชจ๋ธ๋“ค - AlexNet, VGGNet, ResNet
  2. Object Detection - CNN์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ• ๊นŒ์š”?
  3. Autoencoder ์™€ Upsampling
  4. Semantic Segmentation ์€ ์–ด๋–ป๊ฒŒ ํ• ๊นŒ์š”?
  5. (์‹ค์Šต) Autoencoder ๋‹ค๋ค„๋ณด๊ธฐ
  6. (์‹ค์Šต) CNN pretrained model ํ™œ์šฉํ•˜๊ธฐ

๋Œ€ํ‘œ์ ์ธ CNN ๋ชจ๋ธ๋“ค - AlexNet, VGGNet, ResNet


AlexNet

  • CNN์˜ ๊ธฐ๋ฐ˜์„ ๋‹ฆ์€ ๋ชจ๋ธ

  • GPU๋ฅผ ์‚ฌ์šฉํ•œ CNN ํ•™์Šต, Activation ํ•จ์ˆ˜๋กœ ReLU ์‚ฌ์šฉ, Data augmentation ํ™œ์šฉ, CNN์— dropout ๋„์ž…

  • Data augmentation?

    • ๋ฐ์ดํ„ฐ์˜ label์„ ์œ ์ง€ํ•œ์ฑ„๋กœ transform ์‹œ์ผœ์„œ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋Š” ๋ฐฉ๋ฒ•
    • Overfitting ๋ฐฉ์ง€, generalization ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌ
    • ์ƒํ•˜์ขŒ์šฐ ๋ฐ˜์ „, ๋…ธ์ด์ฆˆ ์‚ฝ์ž…, random crop, blur

VGGNet

  • ๋งŽ์€ Layer์˜ ๊ฐœ์ˆ˜ -> conv, fc layer ์ด 19๊ฐœ: 144 million parameters

  • Conv filter ํฌ๊ธฐ๋Š” 3x3 ์œผ๋กœ ๊ณ ์ •

    • ์žฅ์ : 3x3 conv layer๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ ์Œ“์œผ๋ฉด ๋” ์ ์€ parameter ๊ฐœ์ˆ˜๋กœ layer๋ฅผ ์Œ“์„ ์ˆ˜ ์žˆ๋‹ค.
    • ReLu(non-linearlity)๊ฐ€ ๋งŽ์ด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ์–ด์„œ ํ‘œํ˜„๋ ฅ์ด ์ข‹์•„์ง
  • 3x3 size conv filter, stride=1, padding=1 ์ธ conv layer๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•จ

ResNet

  • Conv layer๊ฐ€ ๋งŽ์•„์ง„๋‹ค๊ณ  ์„ฑ๋Šฅ์ด ๊ผญ ์ข‹์•„์ง€๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค.

    • Gradient vanishing / exploding
    • Layer ๋“ค๊ฐ„์˜ ์—ฐ๊ฒฐ์ด ๋„ˆ๋ฌด ๊ธธ์–ด์ ธ์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ
  • Residual block์„ ์ด์šฉํ•˜์—ฌ conv layer๊ฐ€ ๋งŽ์€ ๋„คํŠธ์›Œํฌ์— ๋Œ€ํ•ด์„œ ํ•™์Šต์„ ์„ฑ๊ณต

    • Skip connection์„ ์ด์šฉ

    Residual Block

    • Residual(์ž”์ฐจ): ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’์˜ ์ฐจ์ด
    • Skip connection์„ ์ด์šฉํ•˜์—ฌ ์ด์ „ layer์˜ feature๋ฅผ ์ง์ ‘ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.


      Residual Block:

    Bottleneck Block

    • 1x1 convolution์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต parameter์ˆ˜์™€ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ๊ตฌ์กฐ, ResNet-50 ์ด์ƒ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

    • 1x1 convolution: ์—ฐ์‚ฐ๋Ÿ‰์ด ์ž‘์•„ feature map์˜ channel์„ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ค„์ผ ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.

    • Bottleneck block ๋‹จ๊ณ„

      1. 1x1 conv๋กœ feature map channel ์ถ•์†Œ
      2. ์ถ•์†Œ๋œ channel์— 3x3 conv๋กœ spartialํ•œ feature map์„ ์–ป์Œ
      3. 1x1 conv๋กœ feature map์˜ channel ์›์ƒ๋ณต๊ตฌ


        Bottleneck Block:
  • ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ: MobileNet, SqueezeNet, ShuffleNet, …

Object Detection - CNN์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ• ๊นŒ์š”?


  • ๋ฌผ์ฒด๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€์™€ ์ข…๋ฅ˜๋ฅผ ์•Œ์•„๋‚ด๋Š” task

    • ์œ„์น˜: Bounding Box๋กœ ํ‘œํ˜„ํ•œ๋‹ค. (x, y, w, h)
    • ์ข…๋ฅ˜: Classification

์„ฑ๋Šฅ ์ธก์ •

  • IoU(Intersection over union): ์‹ค์ œ์˜์—ญ๊ณผ ์˜ˆ์ธกํ•œ ์˜์—ญ์˜ ๊ต์ง‘ํ•ฉ/ํ•ฉ์ง‘ํ•ฉ

  • Precision & Recall

    • True Positive(TP): ์‹ค์ œ๋กœ๋Š” ์žˆ๋Š”๋ฐ ์žˆ๋‹ค๊ณ  ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธก
    • False Positive(FP): ์‹ค์ œ๋กœ๋Š” ์—†๋Š”๋ฐ ์žˆ๋‹ค๊ณ  ํ‹€๋ฆฌ๊ฒŒ ์˜ˆ์ธก
    • False Negative(FN): ์‹ค์ œ๋กœ๋Š” ์žˆ๋Š”๋ฐ ์—†๋‹ค๊ณ  ํ‹€๋ฆฌ๊ฒŒ ์˜ˆ์ธก
    • True Negative(TN): ์‹ค์ œ๋กœ ์—†๋Š”๋ฐ ์—†๋‹ค๊ณ  ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธก
    • Precision(์ •๋ฐ€๋„): TP/TP+FP -> ์ฐพ์€ ๊ฒƒ๋“ค ์ค‘ ์‹ค์ œ๋กœ ์žˆ๋Š”๊ฒƒ์ด ์–ผ๋งˆ๋‚˜ ๋˜๋Š”๊ฐ€?
    • Recall(์žฌํ˜„์œจ): TP/TP+FN -> ์‹ค์ œ๋กœ ์žˆ๋Š”๊ฒƒ ์ค‘ ์ฐพ์€๊ฒƒ์ด ์–ผ๋งˆ๋‚˜ ๋˜๋Š”๊ฐ€?
  • Precision-Recall ๊ณก์„ (PR ๊ณก์„ )

    • Confidence threshold์— ๋”ฐ๋ฅธ precision๊ณผ recall
    • Confidence : ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์˜ˆ์ธก์— ํ™•์‹ ์„ ๊ฐ€์ง€๋Š” ์ •๋„ -> threshold ์ดํ•˜์˜ confidence๋ฅผ ๊ฐ€์ง€๋Š” ์˜ˆ์ธก์€ ๋ฌด์‹œ
    • Confidence threshold๊ฐ€ ๋‚ฎ์•„์ง€๋ฉด detection์ด ๋งŽ์•„์ง€๊ณ  pricision์ด ๋‚ฎ์•„์ง€๊ณ  recall์ด ๋†’์•„์ง
    • Confidence threshold๊ฐ€ ๋†’์•„์ง€๋ฉด detection์ด ์ ์–ด์ง€๊ณ  pricision์ด ๋†’์•„์ง€๊ณ  recall์ด ๋‚ฎ์•„์ง
  • Average Precision(AP)

    • Pricision๊ณผ recall์„ ๋™์‹œ์— ๊ณ ๋ คํ•œ ์„ฑ๋Šฅ์ง€ํ‘œ
    • PR ๊ณก์„ ์˜ ์•„๋ž˜ ์˜์—ญ์˜ ๋„“์ด
  • mean Average Precision(mAP)

    • ๊ฐ class๋‹น AP์˜ ํ‰๊ท 

Pretrained CNN ๋ชจ๋ธ

  • ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šต ์‹œํ‚ค๋ ค๋ฉด ๊ต‰์žฅํžˆ ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค. -> pretrained model์„ ์ด์šฉํ•˜์—ฌ task์— ๋งž๊ฒŒ finetuning ํ•œ๋‹ค.(์ž‘์€ lr)

    • R-CNN -> Fast R-CNN -> Faster R-CNN
    • SPPNet
    • YOLO
    • SSD

Autoencoder ์™€ Upsampling


  • Semantic Segmentation

    • ๋ชจ๋“  pixel์— ๋Œ€ํ•œ classification (pixelwise classification)
    • Pixcel ๊ฐœ์ˆ˜๋งŒํผ CNN ๋Œ๋ฆฌ๋Š” ๊ฒƒ์€ ๋น„ํšจ์œจ์ 
    • ํ•œ๋ฒˆ์— ์—ฌ๋Ÿฌ pixel์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ณ  ์‹ถ๋‹ค.
  • Encoder-Decoder ๊ตฌ์กฐ

    • Encoder: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์—์„œ ์˜๋ฏธ์žˆ๋Š” Latent feature(Encoder์— ์˜ํ•ด ์••์ถ•๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ feature)๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋„คํŠธ์›Œํฌ
    • Decoder: Feature๋กœ๋ถ€ํ„ฐ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ ๊ฐ’์„ ์ƒ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ
  • Autoencoder

    • ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์ด ๊ฐ™์€ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ์ธ๊ณต์‹ ๊ฒฝ๋ง
    • Label ์—†์ด๋„ ์ž…๋ ฅ๋ฐ์ดํ„ฐ์˜ feature๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ
    • Feature ์ฐจ์› ์ถ•์†Œ -> ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ ์‚ด๋ฆฌ๊ธฐ
  • Decoder์˜ upsampling

    • ํ•™์Šต parameter๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ: Bilinear interpolation, Nearest neighbor
    • ํ•™์Šต parameter๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ : Transposed convolution (Deconvolution)

Semantic Segmentation์€ ์–ด๋–ป๊ฒŒ ํ• ๊นŒ์š”?


  • ์„ฑ๋Šฅ์ธก์ •: mean IoU, bounding box ๋Œ€์‹  pixel๋กœ IoU ๊ณ„์‚ฐ -> class ๋ณ„ IoU์˜ ํ‰๊ท 

  • ๋งŽ์ด ์“ฐ๋Š” ๋ฐ์ดํ„ฐ์…‹: PASCAL VOC 2012, MS COCO, Cityspaces

  • FCN(Fully Convolutional Network)

    • fc layer ์—†์ด conv layer๋กœ๋งŒ ๊ตฌ์„ฑ๋จ -> ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์— ์ œ์•ฝ๋ฐ›์ง€ ์•Š์Œ
    • VGG16์„ backbone newtwork๋กœ ์‚ฌ์šฉ, VGG16์˜ fc layer๋ฅผ conv layer๋กœ ๋ณ€ํ™˜
    • VGG16์˜ fc layer feature๋งŒ์œผ๋กœ upsampling ํ•˜๋ฉด corseํ•œ ์ •๋ณด๋งŒ ๋‹ด๊น€
    • ์ค‘๊ฐ„ conv feature map์˜ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋””ํ…Œ์ผ๋“ค์„ ์ถ”๊ฐ€
  • DeconvNet

    • Deconvolution layer๋„ convolution layer๋งŒํผ ๋งŽ์ด, ๋Œ€์นญ์ ์œผ๋กœ ์Œ“์•„๋ณด์ž
  • U-Net

    • skip connection
    • Encoder์˜ conv feature map์„ ๋Œ€์‘๋˜๋Š” ํฌ๊ธฐ์˜ decoder layer์— ์ง์ ‘ ์ „๋‹ฌ
  • DeepLab V3

    • Atrous convolution(Dilated convolution)
    • Conv filter ์‚ฌ์ด๋ฅผ ํ™•์žฅํ•˜์—ฌ convolution ํ•˜๋Š” ๋ฐฉ๋ฒ•
    • ๋„“์€ Receptive field๋ฅผ ๊ฐ€์ง -> spartial ์ •๋ณด์— ๋Œ€ํ•œ ์†์‹ค์„ ์ค„์ผ์ˆ˜ ์žˆ๋‹ค.
    • Atrous Spartial Pyramid Pooling(ASPP)
      • Atrous rate๊ฐ€ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ filter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ multi-scale ์ •๋ณด๋ฅผ ํš๋“

shin alli
๊ธ€์“ด์ด
shin alli
Backend ๊ฐœ๋ฐœ์ž (Python, Django, AWS)