python 기초 정리

AI Basic

python 기초 정리

wandering developer 2023. 9. 11. 01:44

https://github.com/karpathy/lecun1989-repro/

GitHub - karpathy/lecun1989-repro: Reproducing Yann LeCun 1989 paper "Backpropagation Applied to Handwritten Zip Code Recognitio

Reproducing Yann LeCun 1989 paper "Backpropagation Applied to Handwritten Zip Code Recognition", to my knowledge the earliest real-world application of a neural net trained with backpropa...

github.com

repro.py 해당 코드를 분석해보면 많은 내용을 익힐 수 있다 하나 하나 정리해보자!

1. 환경 변수는 import argparse 라이브러기 사용.

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description="Train a 1989 LeCun ConvNet on digits")
    parser.add_argument('--learning-rate', '-l', type=float, default=0.03, help="SGD learning rate")
    parser.add_argument('--output-dir'   , '-o', type=str,   default='out/base', help="output directory for training logs")
    args = parser.parse_args()
    print(vars(args))

출력과로 어떤 변수가 설정되어 있는지 까지 확인

{'learning_rate': 0.03, 'output_dir': 'out/base'}

args.output_dir
args.learning_rate

이런식으로 사용 할 수 있다.

2. lambda 식 사용.

람다식에대한 설명은 https://wikidocs.net/64

3.5 람다(lambda)

오늘은 람다 형식과 그것을 이용하는 여러 가지 함수들에 대해서 알아보겠습니다. 당장 완벽하게 소화하실 필요는 없을 것 같구요, 가벼운 마음으로 이런 것이 있다는 정도만 아셔도 되…

wikidocs.net

위에서 확인함.

winit = lambda fan_in, *shape: (torch.rand(*shape) - 0.5) * 2 * 2.4 / fan_in**0.5

self.H1w = nn.Parameter(winit(5*5*1, 12, 1, 5, 5))

위에서 참조할 사항은 *shape을 사용해서 다차원 배열을 받을 수 있도록 하는 것이다.

실제로 위 예에서 출력 크기를 측정해보면

self.H1w.size()
>> torch.Size([12, 1, 5, 5])

의 출력이 나온다.

3. assert 기능 사용

내가 뭔가 확인하고 틀린 경우 assert를 이용해서 개발자에게 알려준다.

https://wikidocs.net/21050

03_가정 설정문(assert)

assert는 뒤의 조건이 True가 아니면 AssertError를 발생한다. ``` >>> a = 3 >>> assert a == 2 #결과 Traceback (most r…

wikidocs.net

여기서는 element크기 체크 용으로 사용하고 있다.

true 이면 그냥 넘어가고 false인경우 에러가 발생한다.

assert self.H2w.nelement() + self.H2b.nelement() == 2592

assert self.H1w.nelement() + self.H1b.nelement() == 1068 인 코드를

아래처럼 변경하면 아래처럼 에러가 발생함.

assert self.H1w.nelement() + self.H1b.nelement() == 1067
>> Traceback (most recent call last):
>> File "<string>", line 1, in <module>
>> AssertionError

4. @ 행렬곱 사용

matrix연산을 의미함. 쉬운거지만 생소 할 수도 있음.

https://cosmosproject.tistory.com/504

Python numpy : 행렬 연산 (+, -, *, /, @, 행렬의 곱)

numpy에서 행렬을 만들고 행렬끼리의 연산을 하는 방법을 알아보겠습니다. import numpy as np arr_1 = np.array( [ [1, 2], [3, 4] ] ) arr_2 = np.array( [ [2, 3], [4, 5] ] ) arr_new = arr_1 + arr_2 print(arr_new) -- Result [[3 5] [7 9]

cosmosproject.tistory.com

5. model.eval()

모델에서 평가모드 변환은 위 명령어를 사요함.

드롭아웃이 비활성화되고 배치 정규화의 이동 평균과 이동 분산이 업데이트되지 않습니다.

출처 :

https://wikidocs.net/195115

6. tensorboard 사용

학습 종료 후

tensorboard --logdir ./out/base

명령어를 입력하면 아래와 같은 결과를 얻을 수 있음.

from tensorboardX import SummaryWriter # pip install tensorboardX

writer = SummaryWriter(args.output_dir)

7. loss error 계산하는 과정.

평가 모드로 변환 후 데이터 셋을 설정 후

loss와 error를 계산함.

training

Y.size()
torch.Size([7291, 10])

evaluation
Y.size()
torch.Size([2007, 10])

Y.argmax(dim=1)함수를 사용하면 가장 큰 값의 인덱스를 가져옴

그래서

Y.argmax(dim=1).size()
torch.Size([7291])

이렇게됨.

torch.tensor(False).float()
tensor(0.)
torch.tensor(True).float()
tensor(1.)

err = torch.mean((Y.argmax(dim=1) != Yhat.argmax(dim=1)).float())

값은 1개수/7291, 1개수/2007의 평균값이 에러 값이 됨.

model.eval()
# X, Y = (Xtr, Ytr) 
if split == 'train':
    X, Y = (Xtr, Ytr)
else :
    X, Y = (Xte, Yte)
    
Yhat = model(X)
loss = torch.mean((Y - Yhat)**2)
err = torch.mean((Y.argmax(dim=1) != Yhat.argmax(dim=1)).float())
print(f"eval: split {split:5s}. loss {loss.item():e}. error {err.item()*100:.2f}%. misses: {int(err.item()*Y.size(0))}")
writer.add_scalar(f'error/{split}', err.item()*100, pass_num)
writer.add_scalar(f'loss/{split}', loss.item(), pass_num)

로직 정리

해당 코드의 입력은 16x16임 이경우 어떤 식으로 인식하는지 파악해보자!

layer 1

1. pad를 붙임. 16

x = F.pad(x, (2, 2, 2, 2), 'constant', -1.0) # pad by two using constant -1 for background

x.size()
torch.Size([1, 1, 16, 16])
torch.Size([1, 1, 20, 20]) 로 변경됨.

2. convolution 진행 필터 크기는 5x5이고 채널은 12 개이다.

그래서 최종 출력은 torch.Size([12, 8, 8]) 이다. (20 - 5 + 1)/2 = 8

x = F.conv2d(x, self.H1w, stride=2) + self.H1b
torch.Size([1, 1, 20, 20])
self.H1w.size()
torch.Size([12, 1, 5, 5])

3. x = torch.tanh(x)

-1,1 사의 값으로 만드는 함수 임.

layer 2

4. x = F.pad(x, (2, 2, 2, 2), 'constant', -1.0) # pad by two using constant -1 for background

x.size()
torch.Size([1, 12, 8, 8])
x.size()
torch.Size([1, 12, 12, 12])

똑 같이 pad 를 붙여서 크기를 키운 후

5. torch.Size([1, 4, 4, 4]) 를 3개를 만들어서 합침.

이렇게 하는 이유는 숫자 인식 강건성을 높이기 위해서 하는것임.

slice1 = F.conv2d(x[:, 0:8], self.H2w[0:4], stride=2) # first 4 planes look at first 8 input planes
slice2 = F.conv2d(x[:, 4:12], self.H2w[4:8], stride=2) # next 4 planes look at last 8 input planes
slice3 = F.conv2d(torch.cat((x[:, 0:4], x[:, 8:12]), dim=1), self.H2w[8:12], stride=2) # last 4 planes are cross
x = torch.cat((slice1, slice2, slice3), dim=1) + self.H2b

x.size()
torch.Size([1, 12, 4, 4]) 를 사용함.

6. activation함수 통과
x = torch.tanh(x)

layer 3

7. 12x4x4 배열 한개로 만듬

x = x.flatten(start_dim=1) # (1, 12*4*4)

x.size()

torch.Size([1, 12, 4, 4])

-->

torch.Size([1, 192])

8. 192x30 가중치 학습
x = x @ self.H3w + self.H3b

self.H3w.size()
torch.Size([192, 30])
self.H3b.size()
torch.Size([30])

크기를 가짐.

9. activation 함수 통과

x = torch.tanh(x)

layer 4

10. x = x @ self.outw + self.outb

x.size()
torch.Size([1, 10])
self.outw.size()
torch.Size([30, 10])
11. activation 함수 통과

x = torch.tanh(x)

11. 학습조건

23번 학습하고

learning rate = 0.03 사용함.

loss = torch.mean((y - yhat)**2)

아래 출력 결과를 보면 정답이 아니면 -1, 정답이면 1로 생각하고 가장 오차가 적게 웨이트를 학습시키는것 알 수 있다.

추후 modern.py에서는 crros_entropy로 변경됨.

y
>> tensor([[-1., -1., -1., -1., -1., -1., -1., -1.,  1., -1.]])
yhat
>> tensor([[ 0.7227, -0.7958, -0.5027, -0.8448, -0.9782, -0.8204, -0.9436,  0.0388,
         -0.6230, -0.9541]], grad_fn=<TanhBackward0>)
(y - yhat)**2
>> tensor([[2.9678e+00, 4.1715e-02, 2.4735e-01, 2.4095e-02, 4.7685e-04, 3.2250e-02,
         3.1792e-03, 1.0790e+00, 2.6341e+00, 2.1087e-03]],
       grad_fn=<PowBackward0>)
loss
>> tensor(0.7032, grad_fn=<MeanBackward0>)

12. 요점 정리

즉 위 코드를 분석해보면

16x16 이미지를 입력으로 받으면 padding 2개 붙이고 12채널 5x5 필터를 이용해서 학습함.

그 다음 8채널 5x5필터를 이용해서 학습함.

self.H2w[0:4].size()
torch.Size([4, 8, 5, 5])

그 다음 2단계로 fully connected를 사용해서 10개의 클래스를 얻음.

여기서 바로 192 --> 10으로 가는것이 아닌 2단계192 x 30 --> 30 x 10 -- > 10 으로 사용함.

실제 결과 값은 아래와 같음.

23
eval: split train. loss 3.879529e-03. error 0.60%. misses: 44
eval: split test . loss 2.824366e-02. error 4.19%. misses: 84

나중에 modern.py에서 어떻게 변경되는지 알아보자!

4% 오차율을 보여준다.