๐Ÿ† ์ž๊ฒฉ์ฆ, ์–ดํ•™ 42

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋ชจํ‰๊ท  ๊ฒ€์ •(๋ชจ์ง‘๋‹จ 1๊ฐœ) T-test,wilcoxon

import scipy.stats as statsfrom scipy.stats import shapiroimport pandas as pdimport numpy as np  ์ผ๋‹จ 3์œ ํ˜•์—์„œ ํ•„์š”ํ•œ๊ฑด ์ •๊ทœ์„ฑ ๊ฒ€์ •์„ ์œ„ํ•ด shapiro๋ฅผ ๋ถˆ๋Ÿฌ์˜ค์ž# 1. ๊ฐ€์„ค์„ค์ •# H0 : mpg ์—ด์˜ ํ‰๊ท ์ด 20๊ณผ ๊ฐ™๋‹ค.# H1 : mpg ์—ด์˜ ํ‰๊ท ์ด 20๊ณผ ๊ฐ™์ง€ ์•Š๋‹ค. # 2. ์œ ์˜์ˆ˜์ค€ ํ™•์ธ : ์œ ์˜์ˆ˜์ค€ 5%๋กœ ํ™•์ธ # 3. ์ •๊ทœ์„ฑ ๊ฒ€์ •# H0(๊ท€๋ฌด๊ฐ€์„ค) : ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.# H1(๋Œ€๋ฆฝ๊ฐ€์„ค) : ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š”๋‹ค.statistic, pvalue = stats.shapiro(df['mpg'])print(round(statistic,4), round(pvalue,4))result = stats.shapiro(df[..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 2์œ ํ˜• ๋ชจ๋ธ ์„ฑ๋Šฅํ‰๊ฐ€ ํ•จ์ˆ˜, ํ•ด์„

๋ชจ๋ธ๋ง ๋ฐ ์„ฑ๋Šฅํ‰๊ฐ€1. ๋ถ„๋ฅ˜ : RandomForestClassifierAccuracyauc : roc ์ปค๋ธŒ ์•„๋ž˜์ชฝ ๋ฉด์ ์„ ๋œปํ•จ, 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์Œf1 : ํด์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์Œfrom sklearn.metrics import accuracy_score, f1_score, roc_auc_score, recall_score, precision_score # (์‹ค์ œ๊ฐ’, ์˜ˆ์ธก๊ฐ’)# ๋‹ค์ค‘๋ถ„๋ฅ˜์ผ ๊ฒฝ์šฐ f1 = f1_score(y_val, y_pred, average = 'macro') auc = roc_auc_score(y_val, y_pred)acc = accuracy_score(y_val, y_pred) f1 = f1_score(y_val, y_pred)   2. ํšŒ๊ท€ : RandomForestRegr..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ 2ํšŒ - drop

๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.2ํšŒ๋Š” ๊ฒฐ์ธก์น˜๋„ ์—†๋Š”๋ฐ ์˜ค๋ฅ˜๊ฐ€ ๋‚œ ๊ฑฐ์ž„..์™œ ๋‚ฌ๋‚˜ ์‹ถ์–ด์„œ ํ™•์ธํ•ด๋ณด๋‹ˆ๊นŒ๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์™ผ์ชฝ ์ •๋ ฌ์™ผ์ชฝ ์ •๋ ฌ๊ฐ€์šด๋ฐ ์ •๋ ฌ๊ฐ€์šด๋ฐ ์ •๋ ฌ ์‚ฌ์ง„ ํŽธ์ง‘ ์ž‘๊ฒŒ์ž‘๊ฒŒ๋ฌธ์„œ ๋„ˆ๋น„๋ฌธ์„œ ๋„ˆ๋น„์˜†ํŠธ์ž„์˜†ํŠธ์ž„ ์‚ญ์ œ์‚ญ์ œ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.2ํšŒ๋Š” train ๋ฐ์ดํ„ฐ๋งŒ ์ œ๊ณตํ•˜์ง€ ์•Š๊ณ  y_train์„ ๋”ฐ๋กœ ์ œ๊ณตํ•ด์„œ drop์œผ๋กœ ID๋ฅผ ์•ˆ ๋–จ๊ถœ์Œ๊ทธ๋ž˜์„œ ์นผ๋Ÿผ์ด ๋‘๊ฐœ๋ผ์„œ ์˜ค๋ฅ˜๊ฐ€ ๊ณ„์† ๋‚ฌ๋˜๊ฑฐ..y_train=y_train.drop(columns=['ID'])  ์ด๊ฑฐ ํ•ด์ฃผ๊ณ  ๋‚˜๋‹ˆ๊นŒ ๋’ค์—๋Š” ๋ฌธ์ œ์—†์ด ์ž˜ ํ’€๋ ธ๋‹ค ๋“œ๋””์–ด ๊ธฐ์ถœ 1,2์œ ํ˜• ๋!!! 3์œ ํ˜• ๋‚จ์•˜๋‹ค ใ… ใ… 

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ 3ํšŒ- 2์œ ํ˜• prob

๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.๋‹ค ์ž˜ ํ•ด๋†“๊ณ  ๋งˆ์ง€๋ง‰์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋‚ฌ๋‹ค, ์—๋Ÿฌ๋ฅผ ๋ณด๋‹ˆ 1์ฐจ์›์ด ์•„๋‹ˆ๋ž€๋‹ค ๊ทธ๋ž˜์„œ ๋ญ”๊ฐ€ ์‹ถ์–ด์„œ ์ฐพ์•„๋ด„ 2ํŠน์ • ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋  ํ™•๋ฅ ์„ ๊ตฌํ•  ๊ฒฝ์šฐ (predict_proba)๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐy_result_prob = model.predict_proba(x_test) ์—ฌ๊ธฐ๊นŒ์ง„ ๋งž์Œ์Œ  ๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ prob๋ฅผ ์ข€ ๋” ์ดํ•ดํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค ๋‹ค๋ฅธ ์˜ˆ์ œ๋ฅผ ๋“ค๋ฉด ์‚ฌ์ง„ ์‚ญ์ œ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.result_prob = pd.DataFrame({ 'result': y_result, 'prob_0': y_result_prob[:,0], 'prob_1': y_result_prob[:,1], 'prob_2': y_result_prob[:,2] }) y_result_pro..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ 4ํšŒ - 2์œ ํ˜• ๊ฒฐ์ธก์น˜ ๋Œ€์ฒด,drop

x_train=train.drop(columns=['Segmentation'])x_test=testy_train=train['Segmentation']print(x_train.shape)print(x_test.shape)print(y_train.shape)  print(x_train.info())print(x_test.info())print(y_train.info()) ๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.๊ฒฐ์ธก์น˜๊ฐ€ ์—„์ฒญ ๋งŽ๋‹ค ๋“œ๋””์–ด ๋‚˜์™”๊ตฌ๋‚˜ ๊ฒฐ์ธก์น˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€!print(x_train.describe())print(y_train.describe())print(x_test.describe()) ๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.์ด์ƒ์น˜๊ฐ’ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ดค์œผ๋‚˜ ์—†์—ˆ์Œ print(x_train.isnull()...