์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์„ฑํ™” ํ•ด์ฃผ์„ธ์š”

Supervised Learning - Linear Model, Decision Tree, Ensemble

[TIL] ์˜์นด X ๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ (AI ์—”์ง€๋‹ˆ์–ด ์œก์„ฑ ๋ถ€ํŠธ ์บ ํ”„ 2๊ธฐ) 2์ฃผ์ฐจ

 ·  โ˜• 4 min read

๋“ค์–ด๊ฐ€๋ฉฐ


ย ย ย 2์ฃผ์ฐจ ๊ฐ•์˜์— ์ ‘์–ด๋“ค์—ˆ๋‹ค. ์ด๋ฒˆ ๊ฐ•์˜๋Š” Machine Learning์ค‘ output์ด ์ฃผ์–ด์ง€๋Š” Supervised Learning์— ๋Œ€ํ•ด์„œ, ๊ทธ๋ฆฌ๊ณ  ๊ธฐ๋ณธ์ด ๋˜๋Š” Linear Model์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ Decision Tree, Ensemble ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ๋‚ด์šฉ ์ด์—ˆ๋‹ค. ์ด์ œ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๊ฐ•์˜๊ฐ€ ์‹œ์ž‘๋œ ๊ฒƒ์ธ๋ฐ, ์—„์ฒญ ๊นŠ๊ฒŒ ๋“ค์–ด๊ฐ€์ง€๋Š” ์•Š์ง€๋งŒ ์•„์ง ์šฉ์–ด๋‚˜ ๊ฐœ๋…์— ์ต์ˆ™์น˜ ์•Š๋‹ค๋ณด๋‹ˆ ์‰ฝ์ง€ ์•Š์€ ๋‚ด์šฉ์ด์—ˆ๋‹ค. ์žก์„ค์€ ๊ทธ๋งŒํ•˜๊ณ  ๊ฐ•์˜ ๋‚ด์šฉ ์ •๋ฆฌํ•ด๋ณด์ž.

2์ฃผ์ฐจ


  1. Supervised Learning
  2. Linear Model
  3. Decision Tree
  4. Ensemble
  5. (์‹ค์Šต) Logistic Regression
  6. (์‹ค์Šต) Decision Tree

Supervised Learning


  • ์ง€๋„ ํ•™์Šต, ์ •๋‹ต ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ํ•™์Šต๋ฐฉ๋ฒ•, input x์— ๋Œ€ํ•ด์„œ output ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
  • Train ๋‹จ๊ณ„์—์„œ ๋ชจ๋ธ์„ fitting ํ•˜๊ณ , Test ๋‹จ๊ณ„์—์„œ fitting ํ•œ ๋ชจ๋ธ inference(์ถ”๋ก )๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

    y = f(x) (input x์— ๋Œ€ํ•œ y๊ฐ’์„ ์ถ”๋ก )

Supervised Learning์— ๋‹ค์–‘ํ•œ ์˜ˆ์‹œ๋“ค

  • Linear Model: ๋‹ค๋ฅธ ๋ณต์žกํ•œ ๋ชจ๋ธ์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ชจ๋ธ.

    • Linear Regression

    • Logistic Regression: decision boundary๋ฅผ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜, ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ• ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.

    • Support Vector Machine(SVM): decision boundary(Maximum margin seperator)๋ฅผ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜

      • Maximum Margin Seperator: Seperator(์„ )๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์–‘์˜†์œผ๋กœ ๋ฐ์ดํ„ฐ margin์ด ๊ฐ€์žฅ ๋จผ Seperator
      • Support Vector: Maximum Margin Seperator์™€ ๊ฐ€์žฅ ์ ‘ํ•ด์žˆ๋Š” ๋ฐ์ดํ„ฐ

  • Naive Bayes Classification: ํ™•๋ฅ  ๊ธฐ๋ฐ˜ model, Bayes Rule ์ด๋ผ๋Š” ํ™•๋ฅ ๊ณต์‹์„ ์‚ฌ์šฉ, ์ •ํ™•๋„ ๋ณด๋‹ค๋Š” ๊ฐ€๋ณ๊ณ  ๋น ๋ฅด๊ฒŒ ๋Œ์•„์•ผํ•˜๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉ. (ex. spam ํ•„ํ„ฐ๋ง, ๋ฌธ์žฅ ๊ฐ์ • ๋ถ„์„, ์ถ”์ฒœ ์‹œ์Šคํ…œ)

  • Gaussian Process: ํ™•๋ฅ  ๊ธฐ๋ฐ˜ model, ๋ฐ์ดํ„ฐ {x, f(x)}๊ฐ€ Multivariate(๋‹ค๋ณ€์ˆ˜) Gaussian ๋ถ„ํฌ๋ผ ๊ฐ€์ •, ์˜ˆ์ธก์— ๋Œ€ํ•œ confidence๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Œ

  • K-Nearest Neighbors(KNN):

    • Nonparametric approach (Training ๋ฐ์ดํ„ฐ๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด paramter ๊ฐœ์ˆ˜๋„ ๋Š˜์–ด๋‚˜๋Š” ํ˜•ํƒœ)
    • ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์ €์žฅํ•ด ๋†จ๋‹ค๊ฐ€, ์ƒˆ๋กœ์šด input ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์™”์„๋•Œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด k๊ฐœ์˜ ๊ฐ’๋“ค์„ ํ†ตํ•ด output์„ ์•Œ์•„๋‚ธ๋‹ค.
    • Curse of dimensionality: input feature๊ฐ€ ๊ณ ์ฐจ์›์ผ๋•Œ ๋งŽ์€ ์–‘์˜ ํ•™์Šต๋ฐ์ดํ„ฐ ํ•„์š”ํ•˜๋‹ค.

  • Decision Tree: Explainable ํ•˜๋‹ค. ์‚ฌ๋žŒ์˜ ์‚ฌ๊ณ ๋ฐฉ์‹๊ณผ ์œ ์‚ฌ. Overfitting์ด ๋˜๊ธฐ ์‰ฝ๋‹ค.

  • Random Forest: ์—ฌ๋Ÿฌ๊ฐœ์˜ Decision Tree์˜ Ensemble, ์‚ฌ๋žŒ๋“ค์˜ ์ง‘๋‹จ์ง€์„ฑ๊ณผ ๊ฐ™์€ ๋Š๋‚Œ์ด๋‹ค. Variance๊ฐ€ ๊ฐ์†Œ

  • Neural Network

    ์–ด๋–ค ์ƒํ™ฉ์—๋„ ์ž˜๋งž๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—†๋‹ค. ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ๋‚˜ ๊ฐœ์ˆ˜ ์ปดํ“จํŒ… ์ž์›๋“ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ ์ ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ์•ผ ํ•œ๋‹ค.

Linear Model


  • ๋‹ค๋ฅธ ๋ณต์žกํ•œ ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ชจ๋ธ์ด๋‹ค.
  • ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ -> Generalization
  • ํ™•์žฅ์„ฑ: ๋‹ค๋ฅธ ๋ณต์žกํ•œ ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ชจ๋ธ

Linear Regression

  • y = wx + b
  • loss(์˜ค์ฐจ)์œผ๋กœ๋Š” MSE(์ œ๊ณฑ์˜ ํ‰๊ท ) function์„ ์‚ฌ์šฉ
  • loss๋ฅผ ์ตœ์†Œํ™” ํ•˜๊ธฐ์œ„ํ•œ w, b๋ฅผ ์ฐพ๋Š”๋‹ค.
  • Gradient Descent(๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•): 2์ฐจ ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ด ๋˜๋Š”๊ฒƒ์„ ์ฐพ๋Š”๊ฒƒ์ด ๋ชฉ์ 
  • closed-from solution, gradient descent
  • general linear model: nonlinear ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ fitting

Linear Classification

  • linear decision boundary๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ
  • threshold function์„ ์‚ฌ์šฉํ•œ ๋ถ„๋ฅ˜(0 or 1)
  • Perceptron(์‹ ๊ฒฝ๋ง)

Logistic Regression

  • logistic ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ class label์ด 1์ผ ํ™•๋ฅ ์„ ์˜ˆ์ธก

  • soft threshold, sigmoid ๋ผ๊ณ ๋„ ๋ถˆ๋ฆผ (non linearlity)

  • MSE loss ๋Œ€์‹  log loss ์‚ฌ์šฉ

    ใ„ด non linearlity ๋•Œ๋ฌธ์—

    ใ„ด MSE๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด loss function์ด 2์ฐจํ•จ์ˆ˜์˜ ํ˜•ํƒœ๊ฐ€ ์•„๋‹ˆ๊ฒŒ ๋˜๋ฏ€๋กœ…

Decision Tree


  • ์˜ˆ์ธก๋ณ€์ˆ˜(Predictor): input feature
  • ์˜ˆ์ธก๋ณ€์ˆ˜์˜ ๊ณต๊ฐ„์„ ์—ฌ๋Ÿฌ์˜์—ญ์œผ๋กœ ๊ณ„์ธตํ™”, ๋ถ„ํ• 
  • Explainalbe ํ•˜๋‹ค, ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์ข‹๋‹ค
  • over fitting์— ์ทจ์•ฝ, input data์˜ ์ž‘์€ ๋ณ€ํ™”์—๋„ ์˜ˆ์ธก๊ฐ’์ด ํฌ๊ฒŒ ๋ณ€ํ™” -> Ensemble ๊ธฐ๋ฒ•

Regression Tree

  • ์ „์ฒด ์˜ˆ์ธก๋ณ€์ˆ˜ ๊ณต๊ฐ„์„ ๊ฒน์น˜์ง€ ์•Š๋Š”์˜์—ญ์œผ๋กœ box๋“ค๋กœ ๋ถ„ํ• ํ•œ๋‹ค

  • ๊ทธ๊ณณ์— ์†ํ•˜๋Š” training data ์˜ ํ‰๊ท ์„ ํ†ตํ•ด ์˜ˆ์ธก๊ฐ’์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค

  • Sum of Squared Errors (SSE)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ

  • top-down, greedy: root ๋…ธ๋“œ๋ถ€ํ„ฐ ๋‹จ๊ณ„๋ณ„๋กœ SSE๊ฐ€ ์ตœ์†Œํ™”๋˜๋Š” split์„ ์ฐพ๋Š”๋‹ค. (Recursive Binary Splitting)

  • stopping criterion: over fitting์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด leaf node์˜ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ์ œํ•œ ํ•˜๋Š”๊ฒƒ(ex. leaf node์˜ ๊ฐœ์ˆ˜๋ฅผ 5๊ฐœ๋กœ ์ œํ•œ)

  • Pruning a Tree (ํฐ Tree๋ฅผ ๋งŒ๋“ ํ›„ Pruning(๊ฐ€์ง€์น˜๊ธฐ))

    ใ„ด Cost complexity pruning(weakest link pruning)

Classification Tree

  • ๊ฐ€์žฅ ๋งŽ์ด ๋“ฑ์žฅํ•˜๋Š” class๊ฐ€ ์˜ˆ์ธก class๊ฐ€ ๋œ๋‹ค.
  • Classification error rate: ๊ฐ€์žฅ ๋น„์œจ์ด ๋†’์€ ํด๋ž˜์Šค์˜ ๋น„์œจ๋งŒ ๊ณ ๋ ค
  • Gini index, Entropy: ํ•œ ํด๋ž˜์Šค๋กœ ๋ถ„ํฌ๊ฐ€ ์น˜์šฐ์น˜๊ฒŒ ๋˜๋ฉด ์ž‘์€๊ฐ’, ๊ณจ๊ณ ๋ฃจ ๋ถ„ํฌ๋˜๋ฉด ํฐ๊ฐ’ -> ์ž‘์€๊ฐ’ ์ผ์ˆ˜๋ก ์ข‹๋‹ค. (node impurity๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค)
  • Pruning์—์„œ๋Š” ์ตœ์ข…์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ์œ„ํ•ด Classification error rate๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉ

Ensemble Methods


  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฐ„๋‹จํ•œ ‘building block’ ๋ชจ๋ธ๋“ค์„ ๊ฒฐํ•ฉํ•ด ํ•˜๋‚˜์˜ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฒ•
  • Decision Tree -> Bagging, Random Forest, Boosting

Bagging

  • Bootstrap aggregation
  • Variance ๊ฐ์†Œ ํšจ๊ณผ
  • ๋‹ค๋ฅธ tree ์—ฌ๋Ÿฌ๊ฐœ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ท ํ•ด์„œ ์ตœ์ข… ์˜ˆ์ธก -> low variance
  • Bootstrapping: ๋ฐ์ดํ„ฐ์…‹์—์„œ randomํ•œ ๋ณต์›์ถ”์ถœ์„ ํ†ตํ•ด B๊ฐœ์˜ bootstrapped ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“œ๋Š” ๊ฒƒ
  • Out-of-Bag error estimation: bootstrapped ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋ฅผ validation error๋ฅผ ๊ณ„์‚ฐ
  • Variable Importance Meassures: ๋ชจ๋“  tree์— ๋Œ€ํ•œ ์˜ˆ์ธก๋ณ€์ˆ˜์˜ split์œผ๋กœ ์ธํ•ด SSE ๊ฐ์†Œํ•œ ์ •๋„๋ฅผ ์ธก์ •ํ•˜์—ฌ ํ‰๊ท ์„ ์ทจํ•จ. ์˜ˆ์ธก ๋ณ€์ˆ˜๊ฐ€ ํŒ๋‹จ์— ๋„์›€์ด ์–ด๋Š์ •๋„ ๋˜๋Š”์ง€ ํŒŒ์•…ํ•˜๋Š”๋ฐ ๋„์›€์ด ๋œ๋‹ค.
  • Random Forest
    • tree๋“ค์„ decorrelate(data set๋“ค๊ฐ„์˜ correlation์„ ์ค„์ด๋Š” ๊ฒƒ) ํ•ด์ฃผ๊ณ ์ž split์„ ์ง„ํ–‰ํ•  ๋•Œ๋งˆ๋‹ค ์ „์ฒด p๊ฐœ์˜ ์˜ˆ์ธก๋ณ€์ˆ˜ ์ค‘ ๋žœ๋คํ•˜๊ฒŒ m๊ฐœ์˜ ๋ณ€์ˆ˜๋ฅผ ๋ฝ‘๊ณ , ์ด๋“ค๋งŒ ๊ณ ๋ คํ•˜์—ฌ split ์ง„ํ–‰

Boosting

  • Bagging๊ณผ ๋‹ค๋ฅด๊ฒŒ ํ•™์Šตํ•œ Tree๋“ค์˜ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆœ์ฐจ์ ์œผ๋กœ Tree ํ•™์Šต

  • Bias ๊ฐ์†Œ ํšจ๊ณผ

  • ์ „์ฒด data set์„ ์‚ฌ์šฉ, ์ž˜๋ชป ์˜ˆ์ธกํ•œ ๋ฐ์ดํ„ฐ์— ์ง‘์ค‘ํ•˜์—ฌ ๋ฐ˜๋ณตํ•™์Šต์„ ์‹œํ‚จ๋‹ค.

  • 3๊ฐœ์˜ hyperparameter

    • Tree์˜ ๊ฐœ์ˆ˜ B: ๋„ˆ๋ฌด ํฌ๋ฉด overfit ๋  ์ˆ˜ ์žˆ๋‹ค.
    • Shirinkage parameter: ํ•™์Šต ์†๋„๋ฅผ ์กฐ์ ˆ, 0.01 ~ 0.001
    • split ํšŸ์ˆ˜ d: boosted tree์˜ complexity๋ฅผ ์กฐ์ ˆ, boosting์„ ํ†ตํ•ด bias๋ฅผ ์ค„์ด๋ฏ€๋กœ d๊ฐ€ ํด ํ•„์š”๊ฐ€ ์—†๋‹ค.
    • Gradient Boosting, AdaBoost

shin alli
๊ธ€์“ด์ด
shin alli
Backend ๊ฐœ๋ฐœ์ž (Python, Django, AWS)