๐Ÿ† ์ž๊ฒฉ์ฆ, ์–ดํ•™

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ์‹ค๊ธฐ - 3์œ ํ˜• ๋ชจํ‰๊ท  ๊ฒ€์ •(๋ชจ์ง‘๋‹จ 2๊ฐœ) T-test, wilcoxon

๋ฐ์ดํ„ฐํŒ์Šค 2024. 8. 20. 17:58

 

1. ๋Œ€์‘ํ‘œ๋ณธ(์Œ์ฒด) T ๊ฒ€์ • : ๋™์ผํ•œ ๊ฐ์ฒด์˜ ์ „ vs ํ›„ ํ‰๊ท  ๋น„๊ต

import	pandas	as	pd
import	numpy	as	np
import	scipy.stats	as	stats
from	scipy.stats	import	shapiro
 

shapiro๋ฅผ ๋จผ์ € ๋ถˆ๋Ÿฌ์˜จ๋‹ค

#	1.	๊ฐ€์„ค์„ค์ •
#	H0	:	์•ฝ์„	๋จน๊ธฐ์ „๊ณผ	๋จน์€	ํ›„์˜	ํ˜ˆ์••	ํ‰๊ท ์€	๊ฐ™๋‹ค(ํšจ๊ณผ๊ฐ€	์—†๋‹ค)
#	H1	:	์•ฝ์„	๋จน๊ธฐ์ „๊ณผ	๋จน์€	ํ›„์˜	ํ˜ˆ์••	ํ‰๊ท ์€	๊ฐ™์ง€	์•Š๋‹ค(ํšจ๊ณผ๊ฐ€	์žˆ๋‹ค)
 
#	2.	์œ ์˜์ˆ˜์ค€	ํ™•์ธ	:	์œ ์˜์ˆ˜์ค€	5%๋กœ	ํ™•์ธ
 
#	3.	์ •๊ทœ์„ฑ	๊ฒ€์ •	(์ฐจ์ด๊ฐ’์—	๋Œ€ํ•ด	์ •๊ทœ์„ฑ	ํ™•์ธ)
statistic,	pvalue	=	stats.shapiro(df['after']-df['before'])
print(round(statistic,4),	round(pvalue,4))
 

์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ• ์  : ๋Œ€์‘ํ‘œ๋ณธ์€ ์ •๊ทœ์„ฑ ๊ฒ€์ •ํ• ๋•Œ stats.shapiro()์˜ ๊ฐ’์— df['ํ›„']-df['์ „']์„ ๋„ฃ๋Š”๋‹ค

#	4.1	(์ •๊ทœ์„ฑO)	๋Œ€์‘ํ‘œ๋ณธ(์Œ์ฒด)	t๊ฒ€์ •(paired	t-test)
statistic,	pvalue	=	stats.ttest_rel(df['after'],	df['before'],	alternative='two-sided')	#	alternative='two-side
print(round(statistic,4),	round(pvalue,4)	)
 

๋ชจํ‰๊ท  ๊ฒ€์ • - ๋ชจ์ง‘๋‹จ 2๊ฐœ - ๋Œ€์‘ํ‘œ๋ณธ - ์ •๊ทœ์„ฑ O - ttest

stats.ttest_rel() ํ•จ์ˆ˜ ์‚ฌ์šฉ

#	4.2	(์ •๊ทœ์„ฑX)	wilcoxon	๋ถ€ํ˜ธ์ˆœ์œ„	๊ฒ€์ •
statistic,	pvalue	=	stats.wilcoxon(df['after']-df['before'],	alternative='two-sided')
print(round(statistic,4),	round(pvalue,4)	)
#	alternative	(๋Œ€๋ฆฝ๊ฐ€์„ค	H1)	์˜ต์…˜	:	'two-sided',	'greater',	'less'
 

๋ชจํ‰๊ท  ๊ฒ€์ • - ๋ชจ์ง‘๋‹จ 2๊ฐœ - ๋Œ€์‘ํ‘œ๋ณธ - ์ •๊ทœ์„ฑ X - wilcoxon

stats.wilcoxn() ํ•จ์ˆ˜ ์‚ฌ์šฉ

df['ํ›„'] - df['์ „]) ๊ฐ’์„ ๋„ฃ๋Š”๋‹ค

 

 

 

2. ๋…๋ฆฝํ‘œ๋ณธ T ๊ฒ€์ • : A์ง‘๋‹จ์˜ ํ‰๊ท  vs B์ง‘๋‹จ์˜ ํ‰๊ท  (๋™์ผํ•œ ์ง‘๋‹จ์ด ์•„๋‹˜)

#	1.	๊ฐ€์„ค์„ค์ •
#	H0	:	A๊ทธ๋ฃน๊ณผ	B๊ทธ๋ฃน์˜	ํ˜ˆ์••	ํ‰๊ท ์€	๊ฐ™๋‹ค.						(A	=	B)
#	H1	:	A๊ทธ๋ฃน๊ณผ	B๊ทธ๋ฃน์˜	ํ˜ˆ์••	ํ‰๊ท ์€	๊ฐ™์ง€	์•Š๋‹ค.	(A	≠	B)
 
#	2.	์œ ์˜์ˆ˜์ค€	ํ™•์ธ	:	์œ ์˜์ˆ˜์ค€	5%๋กœ	ํ™•์ธ
 
#	3.	์ •๊ทœ์„ฑ	๊ฒ€์ •
#	H0(๊ท€๋ฌด๊ฐ€์„ค)	:	์ •๊ทœ๋ถ„ํฌ๋ฅผ	๋”ฐ๋ฅธ๋‹ค.
#	H1(๋Œ€๋ฆฝ๊ฐ€์„ค)	:	์ •๊ทœ๋ถ„ํฌ๋ฅผ	๋”ฐ๋ฅด์ง€	์•Š๋Š”๋‹ค.
statisticA,	pvalueA	=	stats.shapiro(df['A'])
statisticB,	pvalueB	=	stats.shapiro(df['B'])
print(round(statisticA,4),	round(pvalueA,4))
print(round(statisticB,4),	round(pvalueB,4))
 

๋Œ€์‘ํ‘œ๋ณธ์€ df['ํ›„']-df['์ „'] ๊ฐ’ ๋ฉ์–ด๋ฆฌ๋ฅผ shaprio ํ•จ์ˆ˜์— ๋„ฃ์–ด์„œ ์ •๊ทœ์„ฑ์„ ๊ฒ€์ •ํ–ˆ๋‹ค๋ฉด

๋…๋ฆฝํ‘œ๋ณธ์€ A,B ๊ฐ’์„ ๊ฐ๊ฐ shapiro ํ•จ์ˆ˜์— ๋„ฃ์–ด ์ •๊ทœ์„ฑ์„ ๊ตฌํ•จ

๋งŒ์•ฝ ํ•˜๋‚˜๋ผ๋„ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š”๋‹ค๋ฉด ๋น„๋ชจ์ˆ˜ ๊ฒ€์ •๋ฐฉ๋ฒ•(์œŒ์ฝ•์Šจ)์„ ์จ์•ผ ํ•จ >> ๊ทผ๋ฐ ๋น„๋ชจ์ˆ˜๋Š” ์‹œํ—˜์— ์ถœ์ œ๋  ํ™•๋ฅ ์ด ์ ์Œ

#	4.	๋“ฑ๋ถ„์‚ฐ์„ฑ	๊ฒ€์ •
#	H0(๊ท€๋ฌด๊ฐ€์„ค)	:	๋“ฑ๋ถ„์‚ฐ	ํ•œ๋‹ค.
#	H1(๋Œ€๋ฆฝ๊ฐ€์„ค)	:	๋“ฑ๋ถ„์‚ฐ	ํ•˜์ง€	์•Š๋Š”๋‹ค.
statistic,	pvalue	=	stats.bartlett(df['A'],	df['B'])
print(round(statistic,4),	round(pvalue,4)	)
 

๋…๋ฆฝํ‘œ๋ณธ์€ ์ •๊ทœ์„ฑ ๊ฒ€์‚ฌ ์™ธ์— ๋“ฑ๋ถ„์‚ฐ์„ฑ๋„ ๊ฒ€์ •ํ•ด์•ผ ํ•˜๋Š”๋ฐ

stats.barlett(df['์นผ๋Ÿผ'], df['์นผ๋Ÿผ']) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค

#	5.1	(์ •๊ทœ์„ฑO,	๋“ฑ๋ถ„์‚ฐ์„ฑ	O)	t๊ฒ€์ •
statistic,	pvalue	=	stats.ttest_ind(df['A'],	df['B'], equal_var=True, alternative='two-sided')
print(round(statistic,4),	round(pvalue,4)	)
 

๋ชจํ‰๊ท  ๊ฒ€์ • - ๋ชจ์ง‘๋‹จ 2๊ฐœ - ๋…๋ฆฝํ‘œ๋ณธ - ์ •๊ทœ์„ฑ O - ๋“ฑ๋ถ„์‚ฐ์„ฑ O - ttest

stats.ttest_ind() ํ•จ์ˆ˜ ์‚ฌ์šฉ, equal_var = True

#	5.1	(์ •๊ทœ์„ฑO,	๋“ฑ๋ถ„์‚ฐ์„ฑ	X)	t๊ฒ€์ •
statistic,	pvalue	=	stats.ttest_ind(df['A'],	df['B'], equal_var=False, alternative='two-sided')
print(round(statistic,4),	round(pvalue,4)	)
 

๋ชจํ‰๊ท  ๊ฒ€์ • - ๋ชจ์ง‘๋‹จ 2๊ฐœ - ๋…๋ฆฝํ‘œ๋ณธ - ์ •๊ทœ์„ฑ O - ๋“ฑ๋ถ„์‚ฐ์„ฑ X - ttest

stats.ttest_ind() ํ•จ์ˆ˜ ์‚ฌ์šฉ, equal_var = False

#	5.2	(์ •๊ทœ์„ฑX)์œŒ์ฝ•์Šจ์˜	์ˆœ์œ„ํ•ฉ	๊ฒ€์ •
statistic,	pvalue	=	stats.ranksums(df['A'],	df['B'],	alternative='two-sided')
print(round(statistic,4),	round(pvalue,4)	)
 

๋ชจํ‰๊ท  ๊ฒ€์ • - ๋ชจ์ง‘๋‹จ 2๊ฐœ - ๋…๋ฆฝํ‘œ๋ณธ - ์ •๊ทœ์„ฑ X - ranksums

stats.ranksums() ํ•จ์ˆ˜ ์‚ฌ์šฉ

 

 

 

 

 

ํ—ท๊ฐˆ๋ฆฌ๋‹ˆ๊นŒ ํ‘œ๋กœ ์ •๋ฆฌํ•ด์„œ ์™ธ์›Œ์•ผ๊ฒ ์Œ