๐Ÿ† ์ž๊ฒฉ์ฆ, ์–ดํ•™

[๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ] ๋น…๋ถ„๊ธฐ ์‹ค๊ธฐ 8ํšŒ ํ›„๊ธฐ, ๋ณต์›๋ฌธ์ œ, ๋ฐ์ดํ„ฐ, ์˜ˆ์ƒ ๋‹ต์•ˆ ์ฝ”๋“œ

๋ฐ์ดํ„ฐํŒ์Šค 2024. 8. 21. 18:07

 

24๋…„ 6์›” 22์ผ์— ๋น…๋ถ„๊ธฐ 8ํšŒ๋ฅผ ๋ณด๊ณ  ์™”์Šต๋‹ˆ๋‹ค!

์›๋ž˜๋Š” ๋ฌธ์ œ ๊นŒ๋จน๊ธฐ ์ „์— ๊ธฐ๋กํ•ด์„œ ์˜ฌ๋ฆฌ๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ

๊ธˆ์š”์ผ๋‚  ๋ฐค์ƒˆ์„œ ๋งˆ์ง€๋ง‰์œผ๋กœ ์ฝ”๋“œ ๋ณต์Šตํ•˜๊ณ  ํ† ์š”์ผ์— ์‹œํ—˜๋ณด๊ณ  ์™€์„œ ์“ฐ๋Ÿฌ์ ธ ์ž๋А๋ผ ์ด์ œ์•ผ ์˜ฌ๋ฆฌ๋„ค์š”

๋Œ€ํ‘œ์‚ฌ์ง„ ์‚ญ์ œ

์‚ฌ์ง„ ์„ค๋ช…์„ ์ž…๋ ฅํ•˜์„ธ์š”.

 

ํ›„๊ธฐ๋ถ€ํ„ฐ ๋งํ•˜์ž๋ฉด ์ €๋Š” ์‰ฌ์› ์Šต๋‹ˆ๋‹ค! ๋ชจ๋ฅด๋Š” ๋ฌธ์ œ ํ•˜๋‚˜๋„ ์—†์ด ์ „๋ถ€ ํ’€์—ˆ์Šต๋‹ˆ๋‹ค!

๊ทผ๋ฐ ๊ฐ€์ฑ„์  ํ•ด๋ณด๋‹ˆ 3์œ ํ˜•์— ์†Œ๋ฌธ์ œ ํ•˜๋‚˜ ํ‹€๋ฆฐ๊ฑฐ ๊ฐ™์•„์š”, ๊ทธ๋ž˜๋„ 1์œ ํ˜• 2์œ ํ˜• ๋‹ค ๋งž์•„์„œ 70์ ์œผ๋กœ ํ•ฉ๊ฒฉ์€ ๋ณด์žฅ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค

๋ฐ์ดํ„ฐ ๋งˆ๋‹˜ ๊ธฐ์ถœ ๋ณต์›์— ๋น„ํ•˜๋ฉด ์ง„~~์งœ ์‰ฌ์› ์–ด์š” ใ…‹ใ…‹ใ…‹..

๊ณต๋ถ€๊ธฐ๊ฐ„์€ ๋”ฑ 7์ผ์ด์—ˆ์Šต๋‹ˆ๋‹ค

7์ผ ๊ณต๋ถ€ํ•œ๊ฒƒ์น˜๊ณค ์ •๋ง ์‰ฝ๊ฒŒ ๋‚˜์™”์Šต๋‹ˆ๋‹ค ์™œ๋ƒ? ์ผ๋‹จ 1์œ ํ˜•์— ์‹œ๊ฐ„๋ฌธ์ œ ์•ˆ ๋‚˜์˜ค๊ธธ ๋นŒ์—ˆ๋Š”๋ฐ ์•ˆ ๋‚˜์™”์Šต๋‹ˆ๋‹ค ์ด๊ฒŒ ์ œ์ผ ๊นŒ๋‹ค๋กœ์› ๊ฑฐ๋“ ์š”...

7์›” 5์ผ์— ๊ฐ€์ฑ„์  ๊ฒฐ๊ณผ ๋‚˜์˜ค๋Š”๋ฐ ํ•ฉ๊ฒฉํ•˜๋ฉด 7์ผ๋งŒ์— ํ•ฉ๊ฒฉํ•˜๋Š” ๋ฒ• ๊ธ€์“ฐ๋Ÿฌ ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค!

๋‹ค์Œ์— ์‹ค๊ธฐ ๋ณด์‹œ๋Š” ๋ถ„๋“ค ๋ฒผ๋ฝ์น˜๊ธฐ ํฌ๊ธฐ ํ•˜์ง€๋งˆ์‹ญ์‡ผ 7์ผ์ด๋ฉด ๋ฉ๋‹ˆ๋‹ค..

 

๋ฐ์ดํ„ฐ ๋งˆ๋‹˜๊ณผ ์ œ๊ฐ€ ๋ธ”๋กœ๊ทธ์— ๊ธฐ๋กํ•œ ๊ฒƒ๋งŒ ๋ณต์Šตํ•˜์‹œ๋ฉด 8ํšŒ๋Š” ํ‘ธ๋Š”๋ฐ ์ง€์žฅ ์—†์œผ์‹ค ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ง„์งœ๋กœ ์ œ ๋ธ”๋กœ๊ทธ์— ์–ด๋ ต๋‹ค๊ณ  ์ฝ”๋“œ ์„ค๋ช…ํ•ด๋†“์€ ๋ถ€๋ถ„์ด 8ํšŒ ๊ธฐ์ถœ๋ณด๋‹ค ํ›จ์”ฌ ์–ด๋ ค์›Œ์š”

์ €๋„ ๋‹ค๋ฅธ ๋ธ”๋กœ๊ทธ์—์„œ ์‹œํ—˜ ํ›„๊ธฐ๋“ค์„ ๋ณด๊ณ  ๋งŽ์€ ๋„์›€ ๋ฐ›์•˜๊ธฐ์— ๋‹ค์Œ ํšŒ์ฐจ์— ์‹œํ—˜ ๋ณด์‹ค ๋ถ„๋“ค์„ ์œ„ํ•ด ๊ธ€์„ ๋‚จ๊ธฐ๊ฒ ์Šต๋‹ˆ๋‹ค

์ œ๊ฐ€ ์‹œํ—˜์žฅ์—์„œ ์จ์„œ ํ’€์—ˆ๋˜ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค!

์‹œํ—˜์—์„œ ์ด ์ œ์ถœ ๋‹ต์•ˆ์€ 6/11๋กœ ์ œ์ถœ๋˜๋ฉด ๋ฉ๋‹ˆ๋‹ค! 5๊ฐœ๋Š” ๋ฌธ์ œ๋ผ์„œ 6๊ฐœ๊ฐ€ ๋‹ต์•ˆ์ž…๋‹ˆ๋‹ค

 

๊ธฐ์–ต์ด ๊ฐ€๋ฌผ๊ฐ€๋ฌผํ•ด์„œ ์ˆซ์ž๊ฐ’ ๊ฐ™์€ ๊ฑด ์ •ํ™•ํ•˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค


<1์œ ํ˜•>

# 1๋ฒˆ ๋ฌธ์ œ

1) ๋Œ€๋ฅ™๋ณ„ ํ‰๊ท  ๋งฅ์ฃผ์†Œ๋น„๋Ÿ‰์ด ๋งŽ์€ ๊ณณ์„ ๊ตฌํ•˜์‹œ์˜ค

2) ์•ž์—์„œ ๊ตฌํ•œ ๋Œ€๋ฅ™์—์„œ ๋‹ค์„ฏ๋ฒˆ์งธ๋กœ ๋งฅ์ฃผ ์†Œ๋น„๋Ÿ‰์ด ๋งŽ์€ ๋‚˜๋ผ๋ฅผ ๊ตฌํ•˜์‹œ์˜ค

3) 2๋ฒˆ์— ํ•ด๋‹นํ•˜๋Š” ๋‚˜๋ผ์˜ ๋งฅ์ฃผ ์†Œ๋น„๋Ÿ‰์„ ์ •์ˆ˜๋กœ ์ž‘์„ฑํ•˜์‹œ์˜ค

 

# 1๋ฒˆ ์ •๋‹ต

313

(์•„์ผ๋žœ๋“œ ๋งฅ์ฃผ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๋ฌธ์ œ)

**1๋ฒˆ ์†Œ๋ฌธ์ œ๋ฅผ ์ž˜๋ชป ์ฝ๊ณ  ํ—ท๊ฐˆ๋ฆฌ์‹  ๋ถ„๋“ค์ด ์กฐ๊ธˆ ๊ณ„์‹œ๋”๋ผ๊ตฌ์š”

ํ‰๊ท  ๋งฅ์ฃผ ์†Œ๋น„๋Ÿ‰์ด ๊ฐ€์žฅ ํฐ ๋‚˜๋ผ๊ฐ€ ์†ํ•œ ๋Œ€๋ฅ™์„ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค >> ์ด๋ ‡๊ฒŒ AF๋ฅผ ๊ณ ๋ฅด์‹  ๋ถ„์ด ๊ฝค ๋˜์‹œ๋Š” ใ… ใ… 

๋Œ€๋ฅ™๋ณ„๋กœ ๋ฌถ์–ด์„œ ๊ทธ ๋Œ€๋ฅ™ ๋งฅ์ฃผ์˜ ์†Œ๋น„๋Ÿ‰์„ ํ‰๊ท ์„ ๊ตฌํ•œ ๋‹ค์Œ์— ๊ฐ€์žฅ ํฐ ๋Œ€๋ฅ™์„ ๊ณจ๋ผ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค

๋Œ€๋ฅ™๋ผ๋ฆฌ 'Groupby'๋ฅผ ๋จผ์ € ํ–ˆ์–ด์•ผ ํ•ด์š”!!

 

์•„๋ž˜ ํŒŒ์ผ์€ ์‹ค์ œ ์‹œํ—˜์—์„œ ๋‚˜์˜จ ๋งฅ์ฃผ ํŒŒ์ผ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค!

์ฝ”๋žฉ์—์„œ ์‹ค์ œ๋กœ ํ•ด๋ณด์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค

df.groupby('continent')['beer_servings'].mean().sort.values(ascending=False)
cond=df['continent'=='EU']
df2=df[cond].sort_values('beer_servings', ascending=Falase).reset_index
df2.iloc[4]

# index๊ฐ€ 0๋ฒˆ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋‹ˆ๊นŒ 0,1,2,3,4์œผ๋กœ 4๋ฒˆ ์ธ๋ฑ์Šค์˜ ๊ฐ’์ด 5๋ฒˆ์งธ๋กœ ๋งฅ์ฃผ ์†Œ๋น„๋Ÿ‰์ด ๋งŽ์€ ๋‚˜๋ผ๋ผ๊ณ  ์ƒ๊ฐ ๋˜์–ด์„œ 4๋ฒˆ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค.
df2=df[cond].sort_values('beer_servings', ascending=Falase)
# ์—ฌ๊ธฐ๊นŒ์ง€๋งŒ ์น˜์…”๋„ 5๋ฒˆ์งธ ๊ฐ’๊นŒ์ง€๋Š” ๋ณด์ด๋‹ˆ 5๋ฒˆ์งธ ์ˆœ์„œ์ธ๊ฑฐ ํ™•์ธํ•ด์„œ Ireland๋กœ ์ฐพ์•„๋„ ๋ฉ๋‹ˆ๋‹ค
 

 

# 2๋ฒˆ ๋ฌธ์ œ

๊ด€๊ด‘๊ฐ ๋น„์œจ = ๊ด€๊ด‘์ž…๊ตญ ์ธ์› / (๊ด€๊ด‘+๊ณต๋ฌด ์ž…๊ตญ์ธ์›)

(์ด๊ฑด ๋งž๋Š”์ง€ ๊ธด๊ฐ€๋ฏผ๊ฐ€ํ•ด์š” ๊ด€๊ด‘๊ฐ ๋น„์œจ ์ •์˜๊ฐ€ ๊ธฐ์–ต์•ˆ๋‚จ.. ์•”ํŠผ ์ด๋Ÿฐ์‹์œผ๋กœ ์ƒ๊ฒผ์—ˆ์Œ)

1. ๊ด€๊ด‘๊ฐ ๋น„์œจ์ด ๋‘๋ฒˆ์งธ๋กœ ๋†’์€ ๋‚˜๋ผ์˜ '๊ด€๊ด‘' ์ˆ˜๋ฅผ a๋ผ๊ณ  ์ •์˜ํ•˜์‹œ์˜ค

2. ๊ด€๊ด‘๊ฐ ์ˆ˜๊ฐ€ ๋‘๋ฒˆ์งธ๋กœ ๋†’์€ ๋‚˜๋ผ์˜ '๊ณต๋ฌด' ์ˆ˜๋ฅผ b๋ผ๊ณ  ์ •์˜ํ•˜์‹œ์˜ค

3. a+b์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค

 

#2๋ฒˆ ์ •๋‹ต

239

(ํ™์ฝฉ ๊ด€๊ด‘+์ผ๋ณธ ๊ณต๋ฌด)

df['๊ด€๊ด‘๊ฐ๋น„์œจ']=df['๊ด€๊ด‘์ž…๊ตญ']/(df['๊ด€๊ด‘์ž…๊ตญ']+df['๊ณต๋ฌด์ž…๊ตญ'])
df2=df.sort_values('๊ด€๊ด‘๊ฐ๋น„์œจ', ascending=False).iloc[:1,]
print(df2['๊ด€๊ด‘'])
# 'ํ™์ฝฉ'์ด ๋‘๋ฒˆ์งธ๋กœ ๋†’์•˜๊ณ  ๊ทธ ๋‚˜๋ผ์˜ ๊ด€๊ด‘์ˆ˜ 74

df3=df.sort_values('๊ด€๊ด‘์ž…๊ตญ', ascending=False).iloc[:1,]
print(df3['๊ณต๋ฌด'])
# '์ผ๋ณธ'์ด ๋‘๋ฒˆ์งธ๋กœ ๋†’์•˜๊ณ  ๊ทธ ๋‚˜๋ผ์˜ ๊ณต๋ฌด ์ˆ˜ 165

print(a+b)
# ์ •๋‹ต 239
 

# 3๋ฒˆ ๋ฌธ์ œ

Co ์นผ๋Ÿผ๊ณผ Nmch ์นผ๋Ÿผ์˜ ์ตœ๋Œ€-์ตœ์†Œ scaler ๋ฅผ ์‹œํ–‰ํ•œ ๋‹ค์Œ ๊ฐ ์นผ๋Ÿผ์˜ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ๊ตฌํ•˜์‹œ์˜ค

co ์นผ๋Ÿผ์˜ ํ‘œ์ค€ ํŽธ์ฐจ = a, Nmch ์นผ๋Ÿผ์˜ ํ‘œ์ค€ํŽธ์ฐจ = b

a-b๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค

์ตœ๋Œ€์ตœ์†Œ ์ •๊ทœํ™”๋Š” ๋ฌธ์ œ์— ์ •์˜ํ•ด์คฌ์Šต๋‹ˆ๋‹ค

=( Xn - Xmin ) / (Xmax - Xmin)

 

# 3๋ฒˆ ์ •๋‹ต

-0.026

(๋ฌธ์ œ์—์„œ a-b ๋ผ๊ณ  ์‹์„ ์ •์˜ํ•ด์คฌ์–ด์„œ ๊ผญ ๋งˆ์ด๋„ˆ์Šค๊ฐ€ ๋ถ™์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. '์ฐจ์ด'๋ผ๊ณ  ์ƒ๊ฐํ•ด์„œ ์ ˆ๋Œ“๊ฐ’ ๋ถ™์—ฌ์„œ ํ‹€๋ฆฌ์‹  ๋ถ„๋“ค์ด ๊ฝค ๋˜์—ˆ์Œ)

sklearn.preprocessing import MinMaxScalerํ•ด๋„ ๋‹ต์€ ๊ฐ™๊ฒŒ ๋‚˜์˜จ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

co_min=df['co'].min()
co_max=df['co'].max()
nmch_min=df['nmch'].min()
nmch_max=df['nmch'].max()

df['co_scaler']= (df['co']-co_min) / (co_max-co_min)
df['nmch_scaler']= (df['nmch']-nmch_min) / (nmch_max-nmch_min)
a=df['co_scaler'].std()
b=df['nmch_sclaer'].std()
print(a-b)
## ๋‹ต -0.026


## sklearn ์‚ฌ์šฉ ์˜ˆ์‹œ
from	sklearn.preprocessing	import	MinMaxScaler	
mscaler	=	MinMaxScaler()
df['co']=mscaler.fit_transform(df[['co']])
df['nmch']=mscaler.fit_transform(df[['nmch']])
 

<2์œ ํ˜•>

์ข…์†๋ณ€์ˆ˜: ์ง€ํ•˜์ฒ ์—ญ ์ธ์›์ˆ˜, ํ‰๊ฐ€์ง€ํ‘œ: mae

์ข…์†๋ณ€์ˆ˜๊ฐ€ ์ธ์›์ˆ˜์—ฌ์„œ ์—ฐ์†์ด๋ฏ€๋กœ ํšŒ๊ท€๋ฌธ์ œ ์˜€๊ณ  ๋ฒ”์ฃผํ˜• ์นผ๋Ÿผ๋“ค์ด ์žˆ์–ด์„œ ์›ํ•ซ ์ธ์ฝ”๋”ฉ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค

mae=104 ์ •๋„ ๋‚˜์™”์Šต๋‹ˆ๋‹ค

name ์นผ๋Ÿผ์„ ์ถ”๊ฐ€ํ• ์ง€ ๋ง์ง€ ๊ณ ๋ฏผํ–ˆ๋Š”๋ฐ ์ด๊ฑฐ ๋นผ๋ฉด x_test ๊ฐœ์ˆ˜๊ฐ€ 2064๊ฐ€ ์•ˆ ๋‚˜์™€์„œ ํฌํ•จ์‹œ์ผœ์„œ ํ–ˆ์Šต๋‹ˆ๋‹ค

name ๋นผ๊ณ  ๊ตฌํ•˜์‹  ๋ถ„๋“ค์€ mae๊ฐ€ 400๋Œ€๊ฐ€ ๋‚˜์™”๋‹ค๋„ค์š” ๋„ฃ๊ณ  ํ•œ๊ฒŒ ๋งž๋Š”๋“ฏ

(๋ฌธ์ œ์—์„œ ์ตœ์ข… ์ œ์ถœ๋œ ์นผ๋Ÿผ์˜ ํฌ๊ธฐ๊ฐ€ 2064๋ผ๊ณ  ๊ณ ์ง€ํ•ด์คฌ์Œ.. ์ˆซ์ž๋Š” ์ •ํ™•ํ•˜์ง€ ์•Š๋‹ค๋งŒ)

 

์ถœ๋ ฅํ•œ๋‹ค์Œ์— ์ œ์ถœ ๋ฒ„ํŠผ ๊ผญ ๋ˆŒ๋Ÿฌ์•ผํ•ฉ๋‹ˆ๋‹ค!!

x_train=train.drop(columns=['์ธ์›์ˆ˜'])
y_train=train['์ธ์›์ˆ˜']
x_test=test

# ํ–‰,์—ด ๊ฐœ์ˆ˜ ๋งž๋Š”์ง€ ํ™•์ธ
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)

# info > category๋‚˜ object ์žˆ๋Š”์ง€ ํ™•์ธ > object ์žˆ์Œ > ์›ํ•ซ ์ธ์ฝ”๋”ฉ ํ•„์š”
print(x_train.info())
print(x_test.info())
print(y_train.info())

# describe ์ด์ƒ์น˜ ํ™•์ธ > min, max ๋น„๊ตํ•ด์„œ ์ด์ƒ์น˜ ์žˆ์œผ๋ฉด ๋Œ€์ฒด > ์ด์ƒ์น˜ ์—†์Œ
print(x_train.describe())
print(x_test.describe())

#๊ฒฐ์ธก์น˜ ํ™•์ธ > ๊ฒฐ์ธก์น˜ ์—†์Œ
print(x_train.isnull().sum())
print(x_test.isnull().sum())
print(y_train.isnull().sum())

#๋ณ€์ˆ˜์ฒ˜๋ฆฌ(์›ํ•ซ ์ธ์ฝ”๋”ฉ)
x_train=pd.get_dummies(x_train)
x_test=pd.get_dummies(x_test)

#์›ํ•ซ ์ธ์ฝ”๋”ฉ ์ œ๋Œ€๋กœ ๋๋Š”์ง€ ํ™•์ธ
print(x_train.info())
print(x_test.info())

# ํ›ˆ๋ จ,๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ๋ถ„ํ• 
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train,
												  y_train,
												  test_size=0.2)

#์ œ๋Œ€๋กœ ๋ถ„ํ•  ๋๋Š”์ง€ ๊ฐœ์ˆ˜ ํ™•์ธ
print(x_train.shape)
print(x_val.shape)
print(y_train.shape)
print(y_val.shape)

# ํ•™์Šต ์‹œํ‚ค๊ธฐ
from sklearn.ensemble import RandomForestRegressor
model=RandomForestRegressor()
model.fit(x_train,y_train)

# ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ
y_pred=model.predict(x_val)
from sklearn.metrics import mean_absolute_error
mae=mean_absolute_error(y_val, y_pred)
print(mae)

# x_test๊ฐ’ ๋„ฃ๊ธฐ
y_result=model.predict(x_test)

#๋ฐ์ดํ„ฐ ์ œ์ถœ
pd.DataFrame({'pred':y_result}).to_csv('result.csv',index=False)

#์ œ๋Œ€๋กœ ์ œ์ถœ ๋๋Š”์ง€ ๋ถˆ๋Ÿฌ์„œ ์ฝ์–ด๋ณด๊ธฐ
df2=pd.read_csv("result.csv")
print(df2.head())
print(len(df2)) = 2064
 

<3์œ ํ˜•>

## 3-1-1๋ฒˆ ๋ฌธ์ œ

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์ ์šฉํ•˜์—ฌ ์œ ์˜ํ•˜์ง€ ์•Š์€ ๋ณ€์ˆ˜์˜ ๊ฐœ์ˆ˜๋ฅผ ์“ฐ์‹œ์˜ค

์ƒ์ˆ˜ํ•ญ์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•จ.

์ข…์†๋ณ€์ˆ˜:๊ณ ๊ฐ์ดํƒˆ์ง€์ˆ˜

 

## 3-1-1 ์ •๋‹ต

12

statsmodels ๋ฐฉ์‹์œผ๋กœ ํ’€์–ด์„œ 0.05๋ณด๋‹ค ํฐ ๊ฐ’์„ ๊ณ ๋ฅด๋ฉด ๋˜๋Š” ๋ฌธ์ œ์˜€์Šต๋‹ˆ๋‹ค

ํšŒ๊ท€์‹์—์„œ๋Š” ๊ท€๋ฌด๊ฐ€์„ค:ํšŒ๊ท€์‹์˜ ํ•ด๋‹น ๋ณ€์ˆ˜๊ฐ€ ์˜ํ–ฅ๋ ฅ์ด ์—†๋‹ค vs ๋Œ€๋ฆฝ๊ฐ€์„ค:ํšŒ๊ท€์‹์˜ ํ•ด๋‹น๋ณ€์ˆ˜๊ฐ€ ์˜ํ–ฅ๋ ฅ์ด ์žˆ๋‹ค ์ž…๋‹ˆ๋‹ค

๊ทธ๋Ÿฌ๋ฏ€๋กœ 0.05 ๋ณด๋‹ค ์ž‘์œผ๋ฉด ๋Œ€๋ฆฝ๊ฐ€์„ค ์ฑ„ํƒ,

0.05๋ณด๋‹ค ํฌ๋ฉด ๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ(=ํ•ด๋‹น ๋ณ€์ˆ˜๊ฐ€ ์˜ํ–ฅ๋ ฅ์ด ์—†๋‹ค=์œ ์˜ํ•˜์ง€ ์•Š๋‹ค) ์ž…๋‹ˆ๋‹ค.

x=df.drop(columns=['์ดํƒˆ์ง€์ˆ˜']
y=df['์ดํƒˆ์ง€์ˆ˜']

# ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ชจํ˜• ์ ํ•ฉ
x=sm.add_constant(x)
model = sm.Logit(y, x).fit()
summary=model.summary()
print(summary)
 

## 3-1-2๋ฒˆ ๋ฌธ์ œ

์œ ์˜๋ฏธํ•œ ๋ณ€์ˆ˜๋งŒ์„ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ํ•˜์—ฌ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ๋‹ค์‹œ ์ ์šฉํ•˜๋ผ

๊ทธ ํ›„ ํšŒ๊ท€๊ณ„์ˆ˜์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์‹œ์˜ค

 

## 3-1-2 ์ •๋‹ต

-0.456

(๊ทผ๋ฐ ์ €๋Š” 0.111๋ฅผ ์“ด ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.. ์™œ์ง€..? 3๊ฐœ๋งŒ ๊ณจ๋ผ์„œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์ ํ•ฉ์‹œํ‚จ ๊ฒƒ ๊ฐ™์€๋ฐ.. ๊ธฐ์–ต์ด ๊ฐ€๋ฌผ๊ฐ€๋ฌผ..)

+ ์ถ”๊ฐ€

๋‹ค์‹œ ์ƒ๊ฐ๋‚ฌ๋Š”๋ฐ ํšŒ๊ท€๊ณ„์ˆ˜ ํ‰๊ท ์„ ๊ตฌํ• ๋•Œ ์ €๋Š” ์ƒ์ˆ˜ํ•ญ์„ ๋นผ๊ณ  ๊ตฌํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค b1+b2+b3 ์˜ ํ‰๊ท 

๊ทผ๋ฐ bo+b1+b2+b3 ๊นŒ์ง€ ํ•ด์„œ ๊ตฌํ•ด์•ผ ํ•ด์„œ -0.456์ด ๋งž์„ ๊ฒ๋‹ˆ๋‹ค

 


## 3-1-3 ๋ฌธ์ œ

3-1-2์—์„œ ์ ์šฉํ•œ ํšŒ๊ท€์‹์—์„œ calls ๋ณ€์ˆ˜๊ฐ€ 5์ฆ๊ฐ€ํ•˜๋ฉด ์˜ค์ฆˆ๋น„๋Š” ๋ช‡๋ฐฐ ์ฆ๊ฐ€ํ•˜๋Š”๊ฐ€?

 

## 3-1-3 ์ •๋‹ต

7.919

# age์˜ weight ์˜ค์ฆˆ๋น„ ๊ณ„์‚ฐ
odds_ratios = np.exp((model.params['calls'])*5)
print(odds_ratios)
 
np.exp(5*call ๋ณ€์ˆ˜์˜ ํšŒ๊ท€๊ณ„์ˆ˜ ๊ฐ’) =7.919= ์ •๋‹ต
5 * np.exp(call ๋ณ€์ˆ˜์˜ ํšŒ๊ท€๊ณ„์ˆ˜๊ฐ’) =7.563= ์˜ค๋‹ต
 

# 3-2-1 ๋ฌธ์ œ

๋‹ค์ค‘์„ ํ˜• ํšŒ๊ท€๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ์œ ์˜๋ฏธํ•œ ๋ณ€์ˆ˜์˜ ํšŒ๊ท€๊ณ„์ˆ˜๋ฅผ ์“ฐ์‹œ์˜ค

์ข…์†๋ณ€์ˆ˜:piq

๋…๋ฆฝ๋ณ€์ˆ˜:brain, height, weight

 

#3-2-1 ์ •๋‹ต

2.129

(brain์˜ ํšŒ๊ท€๊ณ„์ˆ˜์˜ p value๊ฐ’์ด ๊ฐ€์žฅ ์ž‘์•˜์Šต๋‹ˆ๋‹ค > p value๊ฐ’์ด ์ž‘์œผ๋ฉด ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ, ๋Œ€๋ฆฝ๊ฐ€์„ค ์ฑ„ํƒ)

#	๋ชจ๋ธ๋ง
x=df.drop(columns=['PIQ']
y=df['PIQ']

import	statsmodels.api	as	sm
x	=	sm.add_constant(x)					
model	=	sm.OLS(y,	x).fit()				
#	y_pred	=	model.predict(x)
summary	=	model.summary()
print(summary)
 

# 3-2-2๋ฌธ์ œ

๊ฒฐ์ •๊ณ„์ˆ˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค

 

# 3-2-2์ •๋‹ต

0.313

(์ €๋Š” ์ƒ์ˆ˜ํ•จ ํฌํ•จํ•ด์„œ ๊ฒฐ์ •๊ณ„์ˆ˜ ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค ๊ทผ๋ฐ ์ƒ์ˆ˜ํ•ญ ์ถ”๊ฐ€ ํ•˜๋ฉด ์•ˆ๋œ๋‹ค๋Š” ๋ง๋“ค๋„ ์žˆ๋„ค์š”)


#3-3-3 ๋ฌธ์ œ

์œ„์—์„œ ์ ํ•ฉํ•˜์—ฌ ๋‚˜์˜จ ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€์‹์—์„œ ํ‚ค:70, ๋ชธ๋ฌด๊ฒŒ:150, ๋‡Œํฌ๊ธฐ:90 ์ผ๋•Œ์˜ PIQ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค

 

#3-3-3 ์ •๋‹ต

PIQ= brain*90 + height*70 + weight*150 = 104.873

 


์ฝ”๋“œ๋Š” ์ œ๊ฐ€ ์ง์ ‘ ํ‘ผ๊ฑฐ๋ผ ํ‹€๋ฆด ์ˆ˜๋„ ์žˆ๋‹ค๋Š” ์  ์œ ์˜ํ•ด์ฃผ์„ธ์š”

๋งŽ์€ ๋„์›€์ด ๋˜์…จ์œผ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค!

 

์ฝ”๋“œ๋ฅผ ๋ณต์‚ฌํ•˜์…”์„œ ์“ฐ๊ณ  ์‹ถ์œผ์‹  ๋ถ„์€

๋ฌธ์ œ + ์˜ˆ์ƒ๋‹ต์•ˆ์„ ํŒŒ์ด์ฌ ํŒŒ์ผ๋กœ ๋งŒ๋“ค์–ด์„œ ์ฒจ๋ถ€ํ–ˆ์Šต๋‹ˆ๋‹ค!

๋น„๋ฐ€๋ฒˆํ˜ธ๊ฐ€ ๊ฑธ๋ ค์žˆ์–ด์š” ํ•„์š”ํ•˜์‹  ๋ถ„์€

๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š” :)