๐Ÿ“ฆ๋ถ„์„ ํ”„๋กœ์ ํŠธ/๐Ÿป ์ด๋ชจํ‹ฐ์ฝ˜ ํŠธ๋ Œ๋“œ ๋ฐ ํ†ต๊ณ„ ๋ถ„์„

๐Ÿป ์ด๋ชจํ‹ฐ์ฝ˜ ํŠธ๋ Œ๋“œ ๋ฐ ํ†ต๊ณ„ ๋ถ„์„ (6) - ์นด์ด์ œ๊ณฑ ๊ฒ€์ •

๋ฐ์ดํ„ฐํŒ์Šค 2024. 10. 17. 12:15

 

์นด์ด์ œ๊ณฑ ๊ฒ€์ • : ์ข…๋ฅ˜์™€ ์ˆœ์œ„์™€์˜ ๋…๋ฆฝ์„ฑ ์—ฌ๋ถ€

 

  • ๋ชฉ์ : ํŠน์ • ์ข…๋ฅ˜(๊ฐ•์•„์ง€, ๊ณ ์–‘์ด, ์‚ฌ๋žŒ ๋“ฑ)๊ฐ€ ์ˆœ์œ„์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€, ์•„๋‹ˆ๋ฉด ๋…๋ฆฝ์ ์ธ์ง€ ๋ถ„์„.
  • ๋ฐฉ๋ฒ•: ์ข…๋ฅ˜์™€ ์ˆœ์œ„๋ฅผ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋กœ ๋ณด๊ณ , ์นด์ด์ œ๊ณฑ ๊ฒ€์ •์„ ํ†ตํ•ด ์ด ๋‘ ๋ณ€์ˆ˜๊ฐ€ ๋…๋ฆฝ์ ์ธ์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

์•ž ๊ฒŒ์‹œ๊ธ€์—์„œ ์‹œ๊ฐ์ ์œผ๋กœ ์ข…๋ฅ˜๊ฐ€ ์ˆœ์œ„์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ง€ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

๋…๋ฆฝ์„ฑ ๊ฒ€์ •์„ ํ†ตํ•ด ์‹ค์ œ๋กœ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ์•Œ์•„๋ด…๋‹ˆ๋‹ค.

 

import pandas as pd
import scipy.stats as stats


### 1. ์ข…๋ฅ˜(๊ฐ•์•„์ง€, ๊ณ ์–‘์ด ๋“ฑ)์™€ ์ˆœ์œ„ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ###

# ์ˆœ์œ„๋ฅผ ๋ฒ”์ฃผํ˜•์œผ๋กœ ๋ถ„๋ฅ˜ (์˜ˆ๋ฅผ ๋“ค์–ด, ์ƒ์œ„ 50%๋ฅผ "๋†’์Œ", ๋‚˜๋จธ์ง€๋ฅผ "๋‚ฎ์Œ"์œผ๋กœ ๊ตฌ๋ถ„)
df['์ˆœ์œ„_๋ฒ”์ฃผ'] = pd.qcut(df['์ˆœ์œ„'], q=2, labels=['๋‚ฎ์Œ', '๋†’์Œ'])

# ์ข…๋ฅ˜์™€ ์ˆœ์œ„_๋ฒ”์ฃผ๋กœ ๊ต์ฐจํ‘œ ์ƒ์„ฑ
contingency_table_1 = pd.crosstab(df['์ข…๋ฅ˜'], df['์ˆœ์œ„_๋ฒ”์ฃผ'])
#๋‘ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜(์ข…๋ฅ˜์™€ ์ˆœ์œ„, ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ)์˜ ๊ต์ฐจํ‘œ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

# ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ์ˆ˜ํ–‰
chi2_1, p_1, dof_1, expected_1 = stats.chi2_contingency(contingency_table_1)
#์นด์ด์ œ๊ณฑ ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰, p-value, ์ž์œ ๋„(Degrees of Freedom) ๋ฐ ๊ธฐ๋Œ€๋นˆ๋„๋ฅผ ๊ณ„์‚ฐ

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("์ข…๋ฅ˜์™€ ์ˆœ์œ„ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ๊ฒฐ๊ณผ:")
print(f"Chi-square Statistic: {chi2_1}")
print(f"P-value: {p_1}")
print(f"Degrees of Freedom: {dof_1}")
print("Expected Frequencies:")
print(expected_1)

### 2. ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ(์ผ์ƒ, ์—ฐ์•  ๋“ฑ) ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ###

# ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๊ต์ฐจํ‘œ ์ƒ์„ฑ
contingency_table_2 = pd.crosstab(df['์ข…๋ฅ˜'], df['์นดํ…Œ๊ณ ๋ฆฌ'])

# ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ์ˆ˜ํ–‰
chi2_2, p_2, dof_2, expected_2 = stats.chi2_contingency(contingency_table_2)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("\n์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ๊ฒฐ๊ณผ:")
print(f"Chi-square Statistic: {chi2_2}")
print(f"P-value: {p_2}")
print(f"Degrees of Freedom: {dof_2}")
print("Expected Frequencies:")
print(expected_2)

 

์ข…๋ฅ˜์™€ ์ˆœ์œ„ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ๊ฒฐ๊ณผ:

Chi-square Statistic: 53.46431346431346

P-value: 0.04924187116612567 < 0.05

Degrees of Freedom: 38

 

p-value๊ฐ€ 0.05 ์ดํ•˜์ผ ๊ฒฝ์šฐ๋Š” ์ข…๋ฅ˜์™€ ์ˆœ์œ„ ๋˜๋Š” ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, ๋…๋ฆฝ์ ์ด์ง€ ์•Š๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค. ์ข…๋ฅ˜(๊ฐ•์•„์ง€, ๊ณ ์–‘์ด ๋“ฑ)๊ฐ€ ์ˆœ์œ„์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

์นด์ด์ œ๊ณฑ ๊ฒ€์ • : ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋…๋ฆฝ์„ฑ ์—ฌ๋ถ€

### 2. ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ(์ผ์ƒ, ์—ฐ์•  ๋“ฑ) ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ###

# ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๊ต์ฐจํ‘œ ์ƒ์„ฑ
contingency_table_2 = pd.crosstab(df['์ข…๋ฅ˜'], df['์นดํ…Œ๊ณ ๋ฆฌ'])

# ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ์ˆ˜ํ–‰
chi2_2, p_2, dof_2, expected_2 = stats.chi2_contingency(contingency_table_2)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("\n์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ๊ฒฐ๊ณผ:")
print(f"Chi-square Statistic: {chi2_2}")
print(f"P-value: {p_2}")
print(f"Degrees of Freedom: {dof_2}")
print("Expected Frequencies:")
print(expected_2)

์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„์˜ ์นด์ด์ œ๊ณฑ ๊ฒ€์ • ๊ฒฐ๊ณผ:

Chi-square Statistic: 100.09114203601781

P-value: 0.033492996838640536

Degrees of Freedom: 76

 

p-value๊ฐ€ 0.05 ์ดํ•˜์ผ ๊ฒฝ์šฐ๋Š” ์ข…๋ฅ˜์™€ ์ˆœ์œ„ ๋˜๋Š” ์ข…๋ฅ˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, ๋…๋ฆฝ์ ์ด์ง€ ์•Š๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค. ํŠน์ • ์ข…๋ฅ˜(๊ฐ•์•„์ง€, ๊ณ ์–‘์ด ๋“ฑ)๊ฐ€ ํŠน์ • ์นดํ…Œ๊ณ ๋ฆฌ(์ผ์ƒ, ์—ฐ์• )์— ๋” ์ž์ฃผ ์†ํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.