8/15 - 정리.......

개발일지

8/15 - 정리.......

wandering developer 2024. 8. 16. 20:59

목표 시간	10000
총 시간
공부 시간	20 : 00
시작 시간	18 : 43
종료 시간	xx : xx

목표 : 정리....

처음 만시간의 법칙을 보고 나도 딥러닝을 공부해볼까!

생각을 했다. 로직을 이해하기 보다 우선 실행에 초점을 맞췄다.

변화하나는 세상에서 남이 짠 코드를 실행도 못하면서 어떻게 이해를 한다는 말인가

그리고 보통 딥러닝은 플렛폼을로 개발이 되고 있어서 하나 실행하는 것을 익히면 다른 프로젝트 실행하는데 유익하긴하다.

1. 이미지에서 classification 로직 실행해보기.

정리하는 만큼 처음 환경 셋팅 부터 해보자

1.1 환경 셋팅

우선 본인의 그래픽 카드에 따라 약간 상태가 달라진다. 여기서 난 cuda 11.8을 사용한다고 생각하고 적는다.

11.8 cuda를 사용할 것이다. 그래서 아래와 같이 설치한다.

여기서 헷갈리는 점은 "nvcc -V" 이 부분만 생각하면 된다.

conda create -n tutorial python=3.8

pip install opencv-python

pip install matplotlib

pip install torch==2.0.0 torchvision==0.15.1 --index-url https://download.pytorch.org/whl/cu11

pip install nuscenes-devkit

2. 이미지 classification하기

2-1 모델 선택

요즘 이미지 classification은 쉽게 할 수 있다. 한번 몇줄로 해보자!

우선 이미지 하나를 구하고 실행할 것이다.

실행 코드는 아래와 같다. 이미지 넣어 주고 출력결과를 이미지에 그리면 끝이다.

# 사전 학습된 Faster R-CNN 모델 로드
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# 실행
with torch.no_grad():
    predictions = model(image_tensor)

2-2 입출력 구조 파악

그러나 아무것도 모르는 사람은 생각해야할 것이 많다.

우선 입력은 tensor로 변경해야한다.

"from torchvision import transforms" 함수를 제공한다.

이미지를 읽으면 numpy로 되어 있고 transform을 지나면 tensor로 변경된것을 볼 수 있다.

image = cv2.imread('image_23.jpg')

type(image)
<class 'numpy.ndarray'>
image.shape
(1500, 3000, 3)

image_tensor = transform(image).unsqueeze(0)

type(image_tensor)
<class 'torch.Tensor'>
image_tensor.shape
torch.Size([1, 3, 1500, 3000])

입력으로 tensor 1x3x1500x3000을 넣어 주고 결과 값을 얻는것이다.

당연히 결과값도 tensor로 얻는다.

89는 출력 개수이고 값들은 아래 구조를 가지고 있다.

predictions[0]['boxes'].shape
torch.Size([89, 4])
predictions[0]['labels'].shape
torch.Size([89])
predictions[0]['scores'].shape
torch.Size([89])

2-3 이미지 출력

plt.figure(figsize=(30, 15))    
plt.imshow(image)
plt.axis('off')
plt.show()

아래처럼 출력해보면 이상함을 느낄 수 있다.

RGB 순서가 아니고 BGR 순서이기 때문에 그렇다.

"image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)" 함수를 추가하면 제대로 된 이미지를 얻을 수 있다.

plt.figure(figsize=(30, 15))    
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
plt.axis('off')
plt.show()

2-4. 결과 값 이미지에 출력

# 결과 값 출력  
for box, label, score in zip(predictions[0]['boxes'], predictions[0]['labels'], predictions[0]['scores']):  
    x1, y1, x2, y2 = box
    plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1, edgecolor='red', facecolor='none', linewidth=2))
    plt.text(x1, y1, f'{score:.3f}', fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))

위 처럼 결과 값을 얻을 수 있다.

약간의 필터링과 classification 정보를 추가하려면 약간의 정보가 더 필요하다. 해당 모델은 coco classes를 이용했으므로 해당 정보를 이용하면 아래처럼 그릴 수 있다. 그리고 추가로 score 0.5 이하인 값은 박스에 제거 했다.

score_threshold=0.5
coco_classes = [
    "__background__", "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
    "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep",
    "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich",
    "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed",
    "dining table", "toilet", "TV", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
    "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"
]

# 결과 값 출력  
for box, label, score in zip(predictions[0]['boxes'], predictions[0]['labels'], predictions[0]['scores']):  
    if score > score_threshold:
        x1, y1, x2, y2 = box
        plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1, edgecolor='red', facecolor='none', linewidth=2))
        plt.text(x1, y1, f'{score:.3f}:{coco_classes[label]}', fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))

plt.show()

이렇게 classification정보와 score정보를 필터링해서 보니 훨씬 보기 좋다.

2-5. pickle 형식으로 결과 값 저장

딥러닝 로직이 일반적으로 연산하는데 시간이 많이 걸리기때문에 pickle형식으로 저장하고 저장된 값을 보는 것이 일반적이다.

그래서 pickle로 저장하고 읽어서 보는 방식이다.

사실 파일 한개가지고는 유용한지 파악하기 힘들다.

#결과값 저장
import pickle
with open('results.pkl', 'wb') as f:
    pickle.dump(results, f, protocol=pickle.HIGHEST_PROTOCOL)
    pickle.dump(images, f, protocol=pickle.HIGHEST_PROTOCOL)
    
#결과값 읽기 
with open('results.pkl', 'rb') as f:
    results = pickle.load(f)
    images = pickle.load(f)

추가 예제 :

pickle의 유용성을 보기위해서 입력을 jpg 가 아닌 mp4로 변경해보자.

import cv2
import matplotlib.pyplot as plt
from io import BytesIO
from PIL import Image
import pickle
import numpy as np

score_threshold=0.5
coco_classes = [
    "__background__", "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
    "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep",
    "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich",
    "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed",
    "dining table", "toilet", "TV", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
    "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"
]
target_classes = [0,1,2,3,4,6,7,8,9,10,11,12,13,14,15,16,17]

#결과값 읽기 
with open('results.pkl', 'rb') as f:
    results = pickle.load(f)
    images = pickle.load(f)

import os

width = 1600
height = 800

video = cv2.VideoWriter('image_total_wPred.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 10, (width, height))

enable_plot = False    
plt.figure(figsize=(30, 15))    
for image, predictions in zip(images,results):

    plt.clf()  # 현재 화면을 지우고 갱신할 수 있도록 합니다.
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.imshow(image)
    plt.axis('off')

    # 결과 값 출력  
    for box, label, score in zip(predictions['boxes'], predictions['labels'], predictions['scores']):  
        if score > score_threshold and label in target_classes:
            x1, y1, x2, y2 = box
            plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1, edgecolor='red', facecolor='none', linewidth=2))
            plt.text(x1, y1, f'{score:.2f}:{coco_classes[label]}', fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))

    plt.pause(0.1)
    plt.draw()  # 현재의 플롯을 그립니다.
    
    image = np.frombuffer(plt.gcf().canvas.tostring_rgb(), dtype=np.uint8)
    image = image.reshape(plt.gcf().canvas.get_width_height()[::-1] + (3,))

    resized_frame = cv2.resize(image, (width, height))
    video.write(image)
    

# 동영상 작성 종료
video.release()

만일 "add_patch" 을 사용하지 않고 익숙한 plot함수를 사용하려면 어떻게 변경해야할까?

아래 처럼 변경하면 된다.

x1, y1, x2, y2 = box
x_values = [x1, x2, x2, x1, x1]
y_values = [y1, y1, y2, y2, y1]

#plt.plot을 사용하여 사각형 그리기
plt.plot(x_values, y_values, color='red', linewidth=2)