๐Ÿ† ์ž๊ฒฉ์ฆ, ์–ดํ•™ 42

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ 7ํšŒ - 3์œ ํ˜• ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ถ„์„

๋ฌธ์ œ**train ๋ฐ์ดํ„ฐ๋กœ target์„ ์ข…์†๋ณ€์ˆ˜๋กœ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ age ์ปฌ๋Ÿผ์˜ ์˜ค์ฆˆ๋น„๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ** ใ…‡ ใ….. 3์œ ํ˜• ์ฝ”๋“œ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ผ๊ฐ€๋ฉด ๋ ์ค„ ์•Œ์•˜๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ ์•ˆ ํ’€๋ฆผ.. coef ๊ฐ’์ด ๋‹ค๋ฆ„ ์™œ..?  statsmodels ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด์šฉimport statsmodels.api as smtrain=df.iloc[:210].reset_index(drop=True)test=df.iloc[210:].reset_index(drop=True)x=train.drop(columns=['target'])y=train['target']x=sm.add_constant(x)model2=sm.Logit(y,x).fit()summary=model2.summary()print(summary)result3=np.e..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋…๋ฆฝ๊ฒ€์ •(๋ชจ์ง‘๋‹จ 2๊ฐœ) ์˜ˆ์ œ

import pandas as pdimport numpy as npimport scipy.stats as statsimport scipy.stats as shaprio  #์ •๊ทœ์„ฑ ๊ฒ€์ •sA, pA = stats.shapiro(df['A'])sB, pB = stats.shapiro(df['B'])print(sA,pA)print(sB,pB) ๋Œ€์‘ ํ‘œ๋ณธ์˜ ์ •๊ทœ์„ฑ ๊ฒ€์ • : ๋‘ ์ง‘๋‹จ์˜ ์ฐจ์ด๋ฅผ shapiro๋…๋ฆฝ ํ‘œ๋ณธ์˜ ์ •๊ทœ์„ฑ ๊ฒ€์ • : ์ง‘๋‹จ์„ ๊ฐ๊ฐ shapiro > ๋ชจ๋‘ ๋งŒ์กฑํ•ด์•ผ ์ •๊ทœ์„ฑ O#๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •statistic, pvalue = stats.bartlett(df['A'],df['B'])print(statistic,pvalue) ๋…๋ฆฝ๊ฒ€์ •์€ ๋“ฑ๋ถ„์‚ฐ์„ฑ๋„ ๊ฒ€์ •ํ•ด์ค˜์•ผ ํ•œ๋‹ค๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ • ํ•จ์ˆ˜ bartlettstatisti..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋ชจํ‰๊ท  ๊ฒ€์ • ํ•จ์ˆ˜ ๋น„๊ต

๋ชจ์ง‘๋‹จ 1๊ฐœ๋ชจ์ง‘๋‹จ2๊ฐœ(๋Œ€์‘T๊ฒ€์ •)๋ชจ์ง‘๋‹จ2๊ฐœ(๋…๋ฆฝT๊ฒ€์ •)๋ชจ์ง‘๋‹จ3๊ฐœ(F๊ฒ€์ •, ANOVA)์ •๊ทœ์„ฑ ํ•จ์ˆ˜stats.shapiro(df['mpg'])stats.shapiro(df['after']-df['before'])stats.shapiro(df['A'])stats.shapiro(df['B'])print(stats.shapiro(df['A']) )print(stats.shapiro(df['B']) )print(stats.shapiro(df['C']) )์ •๊ทœ์„ฑO,stats.ttest_1samp(df['mpg'], popmean= 20, alternative='two-sided')stats.ttest_rel(df['after'], df['before'], alternative='two-sided')์ •๊ทœ์„ฑXstatis..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋ชจํ‰๊ท  ๊ฒ€์ •(๋ชจ์ง‘๋‹จ 3๊ฐœ) F-๊ฒ€์ •, ANOVA ๋ถ„์„

import pandas as pdimport numpy as npimport scipy.stats as statsfrom scipy.stats import shapiro  shaprio ์ผ๋‹จ ๋ถˆ๋Ÿฌ์˜ค๊ณ !# 1. ๊ฐ€์„ค์„ค์ •# H0 : ์„ธ ๊ทธ๋ฃน ์„ฑ์ ์˜ ํ‰๊ท ๊ฐ’์ด ๊ฐ™๋‹ค. ( A(ํ‰๊ท ) = B(ํ‰๊ท ) = C(ํ‰๊ท ) ) # H1 : ์„ธ ๊ทธ๋ฃน์˜ ์„ฑ์  ํ‰๊ท ๊ฐ’์ด ์ ์–ด๋„ ํ•˜๋‚˜๋Š” ๊ฐ™์ง€ ์•Š๋‹ค. (not H0) # 2. ์œ ์˜์ˆ˜์ค€ ํ™•์ธ : ์œ ์˜์ˆ˜์ค€ 5%๋กœ ํ™•์ธ # 3. ์ •๊ทœ์„ฑ ๊ฒ€์ •print(stats.shapiro(df['A']))print(stats.shapiro(df['B']))print(stats.shapiro(df['C']))# statistic, pvalue = stats.shapiro(df['A'])# print(rou..

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋ชจํ‰๊ท  ๊ฒ€์ •(๋ชจ์ง‘๋‹จ 2๊ฐœ) T-test, wilcoxon

1. ๋Œ€์‘ํ‘œ๋ณธ(์Œ์ฒด) T ๊ฒ€์ • : ๋™์ผํ•œ ๊ฐ์ฒด์˜ ์ „ vs ํ›„ ํ‰๊ท  ๋น„๊ตimport pandas as pdimport numpy as npimport scipy.stats as statsfrom scipy.stats import shapiro shapiro๋ฅผ ๋จผ์ € ๋ถˆ๋Ÿฌ์˜จ๋‹ค# 1. ๊ฐ€์„ค์„ค์ •# H0 : ์•ฝ์„ ๋จน๊ธฐ์ „๊ณผ ๋จน์€ ํ›„์˜ ํ˜ˆ์•• ํ‰๊ท ์€ ๊ฐ™๋‹ค(ํšจ๊ณผ๊ฐ€ ์—†๋‹ค)# H1 : ์•ฝ์„ ๋จน๊ธฐ์ „๊ณผ ๋จน์€ ํ›„์˜ ํ˜ˆ์•• ํ‰๊ท ์€ ๊ฐ™์ง€ ์•Š๋‹ค(ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค) # 2. ์œ ์˜์ˆ˜์ค€ ํ™•์ธ : ์œ ์˜์ˆ˜์ค€ 5%๋กœ ํ™•์ธ # 3. ์ •๊ทœ์„ฑ ๊ฒ€์ • (์ฐจ์ด๊ฐ’์— ๋Œ€ํ•ด ์ •๊ทœ์„ฑ ํ™•์ธ)statistic, pvalue = stats.shapiro(df['after']-df['before'])print(round(statistic,4), round(pvalue,4)) ์—ฌ..