๐Ÿ† ์ž๊ฒฉ์ฆ, ์–ดํ•™

๏ปฟ[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ 6ํšŒ - 2์œ ํ˜• macro

๋ฐ์ดํ„ฐํŒ์Šค 2024. 8. 20. 17:53

 

2์œ ํ˜•์€ ๊ทธ๋ƒฅ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ผํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋œ๋‹ค ๊ทธ๋ƒฅ ๋ฐฉ์‹์„ ์•”๊ธฐํ•˜์ž

์•„๋ž˜์˜ ๋‚ด์šฉ์€ ์•ˆ ๋ณด๊ณ  ๊ทธ๋ƒฅ ์ƒ๊ฐ๋‚˜๋Š” ๊ณผ์ •์„ ์ญ‰ ์จ๋ดค๋‹ค ์ด ๊ณผ์ •์ด ์ƒ๊ฐ ๋‚œ๋‹ค๋ฉด 2์œ ํ˜•์€ ํ‘ธ๋Š”๋ฐ ๋ฌธ์ œ์—†๋‹ค

 

  1. ๋ฐ์ดํ„ฐ๋ฅผ x_train, x_test, y_train์œผ๋กœ ๋ถ„๋ฆฌํ•œ๋‹ค
  2. x_train, x_test, y_train์˜ shape ํ™•์ธ โ˜ž ํ–‰,์—ด ๊ฐœ์ˆ˜ ํ™•์ธ โ˜žx_train๊ณผ x_test์˜ ์นผ๋Ÿผ ๊ฐœ์ˆ˜๊ฐ€ ์ผ์น˜ํ•œ์ง€ ํ™•์ธ
  3. x_train, x_test, y_train์˜ info ํ™•์ธ โ˜ž ๋ฐ์ดํ„ฐ ํƒ€์ž… ํ™•์ธ โ˜ž object, category ์žˆ์„ ๊ฒฝ์šฐ ์›ํ•ซ ์ธ์ฝ”๋”ฉ
  4. x_train, x_test, y_train์˜ head ํ™•์ธ โ˜ž ๋ฐ์ดํ„ฐ ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋Š”์ง€ ํ•œ๋ฒˆ ์ง์ ‘ ๋ณธ๋‹ค
  5. x_train, x_test, y_train์˜ describe ํ™•์ธ โ˜ž x_train, x_test์˜ ๊ธฐ์ดˆํ†ต๊ณ„๋Ÿ‰ ํ™•์ธ โ˜ž min, max๊ฐ€ ํฌ๊ฒŒ ์ฐจ์ด๋‚˜๋Š”์ง€ ์ด์ƒ์น˜ ํ™•์ธ
  6. x_train, x_test, y_train์˜ is.null().sum() ํ™•์ธ โ˜ž๊ฒฐ์ธก์น˜ ์žˆ๋Š”์ง€ ํ™•์ธ
  7. x_train, x_test ์˜ ID ์ œ๊ฑฐ โ˜ž ID=x_test['ID'].copy() ํ•˜๊ณ  x_train๊ณผ x_test์˜ ID ๋ฅผ drop
  8. x_train, x_test ์›ํ•ซ ์ธ์ฝ”๋”ฉ ์ ์šฉ โ˜ž x_train=pd.get_dummies(x_train) โ˜ž x_train.info() โ˜ž ์›ํ•ซ ์ธ์ฝ”๋”ฉ ํ›„์— ์นผ๋Ÿผ ๊ฐœ์ˆ˜์™€ ์ˆœ์„œ๊ฐ€ ์ผ์น˜ํ•œ์ง€ ๊ผญ ํ™•์ธ
  9. x_train, y_train ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ โ˜ž from sklearn.model_selection import train_test_split ์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ ***๋ถ„๋ฅ˜๋ถ„์„์ผ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌํ• ๋–„ stratify = y_train ์ธตํ™” ์˜ต์…˜ ๊ผญ ๋„ฃ์–ด์ค˜์•ผํ•จ***
  10. x_train,y_trian ๋ฐ์ดํ„ฐ ํ›ˆ๋ จ โ˜ž from sklearn.ensemble import RandomForestClassifer
  11. ์˜ˆ์ธก ๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ โ˜ž y_pred=model.predict(x_val)
  12. ์ฃผ์–ด์ง„ ํ‰๊ฐ€์ง€ํ‘œ๋กœ y_val, y_pred ๋„ฃ์–ด์„œ ๊ตฌํ•ด๋ณด๊ธฐ โ˜ž from sklearn.metrics import f1_score
  13. x_test๋ฅผ ๋„ฃ์–ด์„œ ๊ฒฐ๊ณผ๊ฐ’ ๋งŒ๋“ค๊ธฐ โ˜ž y_result=model.predict(x_test)
  14. ์ œ์ถœํ•  DataFrame ๋งŒ๋“ค๊ธฐ โ˜ž result=pd.DataFrame({'ID':ID,'Target':y_result})
  15. csv๋กœ ์ œ์ถœํ•˜๊ธฐ โ˜ž result.to_csv('datafox.csv',index=False)
  16. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์„œ ํ™•์ธํ•ด๋ณด๊ธฐ โ˜ždf2= pd.read_csv("datafox.csv") โ˜ž print(df2.head(10))

 

 

6ํšŒ์—์„œ ์ค‘์š”ํ•œ๊ฑด

๋ถ„๋ฅ˜๋ถ„์„ ํ–ˆ์„๋•Œ y๊ฐ’์ด ์ด์ง„๋ถ„๋ฅ˜๊ฐ€ ์•„๋‹ˆ๋ผ์„œ f1_score ๊ตฌํ• ๋•Œ ๊ผญ average='macro'๋ฅผ ๋„ฃ์–ด์ค˜์•ผ ํ•จ

y_pred=model.predict(x_val)
from sklearn.metrics import f1_score
f1=f1_score(y_val,y_pred, average='macro')
print(f1)