์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์„ฑํ™” ํ•ด์ฃผ์„ธ์š”

Unsupervised Learning - Dimensionallity Reduction, Clustering

[TIL] ์˜์นด X ๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ (AI ์—”์ง€๋‹ˆ์–ด ์œก์„ฑ ๋ถ€ํŠธ ์บ ํ”„ 2๊ธฐ) 3์ฃผ์ฐจ

 ·  โ˜• 3 min read

๋“ค์–ด๊ฐ€๋ฉฐ


ย ย ย 3์ฃผ์ฐจ ๊ฐ•์˜์— ์ ‘์–ด๋“ค์—ˆ๋‹ค. ์ด๋ฒˆ ๊ฐ•์˜๋Š” Machine Learning์ค‘ output์ด ์ฃผ์–ด์ง€์ง€ ์•Š๋Š” Unsupervised Learning์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด๋‹ค. ์ƒ์†Œํ•œ ์ˆ˜์‹๋“ค๊ณผ ์ฒ˜์Œ๋“ฃ๋Š” ์šฉ์–ด๋“ค ๋•Œ๋ฌธ์— ์ดํ•ดํ•˜๊ธฐ๊ฐ€ ํž˜๋“ค์ง€๋งŒ, ๋„ˆ๋ฌด ์ง‘์ฐฉํ•˜์ง€๋ง๊ณ  ์ตœ๋Œ€ํ•œ ์ˆฒ์„ ๋ณด๋ ค๊ณ  ๋…ธ๋ ฅ์ค‘์ด๋‹ค. ๊ทธ๋ž˜๋„ ์‹ค์Šต์—์„œ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์ง„ํ–‰ํ•˜๋‹ค๋ณด๋‹ˆ ์ข€ ๋” ์ดํ•ดํ•˜๊ธฐ๊ฐ€ ์ˆ˜์›”ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฒˆ ๋ถ€ํŠธ์บ ํ”„์—์„œ๋Š” Peer Group์„ ์ •ํ•ด์ฃผ๋Š”๋ฐ, ์•„๋ฌด๋ž˜๋„ ์˜จ๋ผ์ธ์œผ๋กœ ์ง„ํ–‰ํ•˜๋‹ค๋ณด๋‹ˆ ์„œ๋กœ์˜ ์ง„๋„๋ฅผ ์ฒดํฌํ•˜๊ณ , Study์— ์žˆ์–ด ์„œ๋กœ์—๊ฒŒ ๋„์›€์„ ์ฃผ๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ธ๊ฒƒ ๊ฐ™์•˜๋‹ค. ์•„๋ฌด๋ž˜๋„ ์ทจ์ค€์ƒ์ด๋‚˜ ํ•™์ƒ๋ถ„๋“ค์ด ๋งŽ์ด ์ฐธ์—ฌํ•œ Group๋“ค์ด ๋งŽ์ด ๋ณด์˜€๋‹ค. ๋‹คํ–‰ํžˆ๋„ ์ง์žฅ์ธ๋ถ„๋“ค์ด ๋งŽ์€ Group์ด ์žˆ์–ด ๊ทธ๊ณณ์— ์ฐธ์—ฌํ•˜์—ฌ ๋งŒ๋‚จ์„ ์ง„ํ–‰ํ•˜์˜€๊ณ , ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์˜ ์‚ฌ๋žŒ๋“ค์„ ๋งŒ๋‚˜๋ณด๊ณ  ๊ฐ™์ด Study ํ•˜๋Š” ๊ธฐํšŒ๋ฅผ ๊ฐ€์ง€๊ฒŒ๋˜์–ด ์ข‹์•˜๋˜ ๊ฒƒ ๊ฐ™๋‹ค.

3์ฃผ์ฐจ


  1. Unsupervised Learning
  2. Dimensionallity Reduction
  3. Clustering
  4. (์‹ค์Šต) Data Analysis with Pandas

Unsupervised Learning


  • ์ •๋‹ต label์ด ์—†๋Š” ํŠธ๋ ˆ์ด๋‹์…‹์ด ์ฃผ์–ด์ง
  • output์„ ์˜ˆ์ธก์„ ํ•˜๋Š”๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ์•„๋‹ˆ๋ผ, input feature์—์„œ ์˜๋ฏธ์žˆ๋Š” ํŒจํ„ด ์ฐพ๊ธฐ๊ฐ€ ๋ชฉ์ ์ด๋‹ค.
  • ์‹œ๊ฐํ™”, ์ „์ฒ˜๋ฆฌ, ์ฐจ์›์ถ•์†Œ๋“ฑ์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„ ,์ด์ƒ ํƒ์ง€๋“ฑ์˜ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ
  • Dimensionallity Reduction, Clustering

Dimensionallity Reduction


  • High-dimensional Data: ์ถ”์ฒœ์‹œ์Šคํ…œ(users * movies), ์ด๋ฏธ์ง€, ๋™์˜์ƒ, ์œ ์ „์ž ๋ถ„์„

  • Curse of dimensionality: ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ์ฐจ์›์ผ์ˆ˜๋ก,๊ฐ™์€ ์„ฑ๋Šฅ์˜ ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•ด ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”

  • ๋ถˆํ•„์š”ํ•˜๊ฒŒ ์ค‘๋ณต๋˜๋Š” ๋ณ€์ˆ˜๋‚˜ ์˜๋ฏธ์—†๋Š” ๋ณ€์ˆ˜๋ฅผ ์ค„์ด์ž

    1. PCA(Principal Compo): ๋ฐ์ดํ„ฐ variance๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ์ฐจ์›์ถ•์†Œ

      • ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์„ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•ด์ฃผ๋Š” ์ถ•์„ ์ฐพ๋Š”๋‹ค.
      • projection ์ดํ›„ variance๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ถ• โ†’ Convariance matrix? ๋ฅผ ์ตœ๋Œ€๋กœํ•˜๋Š”
      • PVE?
      • ใ„ด Scree plot์—์„œ โ€œelobw pointโ€๋ฅผ ์ฐพ๊ฑฐ๋‚˜, ๋ฏธ๋ฆฌ ์ •ํ•œ ํฌ๊ธฐ์˜ ๋ถ„์‚ฐ์„ ์„ค๋ช…ํ•˜๋Š” ๊ฐ€์žฅ ์ž‘์€ components๋ฅผ ์‚ฌ์šฉ
      • ํ•œ๊ณ„์ : classificaion์— ๋„์›€์ด ๋˜์ง€์•Š์„ ์ˆ˜ ์žˆ๋‹ค.(variance์— ์ดˆ์ ์„ ๋งž์ถ”๊ธฐ๋•Œ๋ฌธ)
    2. MDS: ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ์ฐจ์›์ถ•์†Œ

    3. t-SNE: local neighborhood ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ์ฐจ์›์ถ•์†Œ, ์ฐจ์›์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” ์‹ ๊ฒฝ์„ ๋ณ„๋กœ ์“ฐ์ง€์•Š๊ณ  ๊ฐ€๊นŒ์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์ด ์ฐจ์›์ถ•์†Œ ํ›„์—๋„ ๊ฐ€๊นŒ์›Œ์ ธ ์žˆ๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•œ๋‹ค.

      • ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์กŒ์„๋•Œ neighbor์ผ ํ™•๋ฅ ์€ gaussian ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
    4. Auto-encoder, Word2Vec: ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ฐจ์› ์ถ•์†Œ

Clustering


  • ๋ฌธ์„œ, ์ด๋ฏธ์ง€ ๊ตฐ์ง‘ํ™”, ์ฃผ์‹ ์ข…๋ชฉ ๊ตฐ์ง‘ํ™”, ์ƒ๊ถŒ ๋ถ„์„, ๊ตฌ๋งค ํŒจํ„ด๋“ฑ

  • Partitioning Clustering: ์‚ฌ์ „์— ์ •์˜๋œ ์ˆซ์ž์˜ ๊ตฐ์ง‘์ค‘ ํ•˜๋‚˜์— ์†Œ์†

    • K-Means Clustering:
      • ๊ฐ ๊ตฐ์ง‘์€ ํ•˜๋‚˜์˜ ์ค‘์‹ฌ์„ ๊ฐ€์ง(centroid)
      • ์‚ฌ์ „์— ๊ตฐ์ง‘์˜ ์ˆ˜ K๊ฐ€ ์ •ํ•ด์ ธ์•ผ ํ•จ
      • SSE๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” partition์„ ์ฐพ๋Š”๊ฒƒ > elbow point k๋ฅผ ์ฐพ๋Š”๊ฒƒ, ๊ทธ ์ด์ƒ์€ overfitting
      • ํ•œ๊ณ„์ : ๊ตฐ์ง‘์˜ ํฌ๊ธฐ, ๋ฐ€๋„๊ฐ€ ๋‹ค๋ฅด๊ฑฐ๋‚˜ ๊ตฌํ˜•์ด ์•„๋‹Œ๊ฒฝ์šฐ ์ข‹์ง€ ์•Š์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ด

  • Hierarchial Clustering: ๊ณ„์ธต์ ์ธ ๋ฐ์ดํ„ฐ ๊ตฐ์ง‘ํ™”. Dendrogram

    • Agglomerative Clustering:
      • K๋ฅผ ๋ฏธ๋ฆฌ ์ •ํ•ด์ค„ ํ•„์š”๊ฐ€ ์—†์Œ
      • ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ๋ฐฉ์‹์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ๊ฒฐํ•ฉ (linkage) ๋ฐฉ์‹์ด ์žˆ์Œ
      • min distance, max distnace, average distance, centroid distance
      • ํ•œ๊ณ„์ : ๊ณ„์‚ฐ๋ณต์žก๋„๊ฐ€ ํฌ๋‹ค, ๊ตฐ์ง‘ํ™”๊ฐ€ ์ž˜๋ชป๋˜๋ฉด ๋˜๋Œ๋ฆด ์ˆ˜ ์—†๋‹ค.

  • Density-based Clustering: ๋ฐ์ดํ„ฐ์˜ density๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž„์˜์˜ ํ˜•ํƒœ์˜ ๊ตฐ์ง‘์„ ์ฐพ๋Š”๊ฒƒ

    • DBSCAN: ๋ฐ์ดํ„ฐ์˜ densitiy๊ฐ€ ๋†’์€ ์˜์—ญ๊ณผ ๊ทธ๋ ‡์ง€ ์•Š์€ ์˜์—ญ์œผ๋กœ ๊ตฌ๋ถ„

shin alli
๊ธ€์“ด์ด
shin alli
Backend ๊ฐœ๋ฐœ์ž (Python, Django, AWS)