OpenCV 이모저모

AI/vision

OpenCV 이모저모

민사민서 2024. 8. 29. 16:22

1. 이미지 열기

cv.imread : 첫 인자는 파일경로

IMREAD_COLOR loads the image in the BGR 8-bit format. This is the default that is used here.
IMREAD_UNCHANGED loads the image as is (including the alpha channel if present)
IMREAD_GRAYSCALE loads the image as an intensity one

cv.imshow : 화면에 뵈기

cv.waitKey(0) : wait for user input in ms (0 ⇒ forever)

cv.imwrite : image is written to a file path

2. 비디오 캡쳐해서 보여주기 및 저장하기 (얘는 rtsp stream, local video file 전부 가능)

cv.VideoCapture() 객체 생성

cap.read() 해서 프레임을 읽어오고

cv.cvtColor(frame, ~) 해서 변환

cv.imshow cv.waitKey(1)==ord('q') 등 해서 종료 조건

cap.release() 해서 capture release

# define the codec and create VideoWriter object
fourcc = cv.VideoWriter_fourcc(*'mp4v')
width, height = int(vidcap.get(cv.CAP_PROP_FRAME_WIDTH)), int(vidcap.get(cv.CAP_PROP_FRAME_HEIGHT))
video_out = cv.VideoWriter(self.video_save_path, fourcc, self.video_save_fps, (width, height))
~~ 
video_out.write(frame)
~~
video_out.release()

3. 이미지 조작하기

import numpy as np
import cv2 as cv

img = cv.imread('messi.jpg')

px = img[100,100] # access pixel value by (row, col) coordinates
print(px) # [57 63 68]
blue = img[100,100,0] # access only blue pixel value (B:0, G:1, R:2)
print(blue) # 57

# modify pixel value
img[10,10] = [255,255,255]
print(img.item(10,10,2))

# if image is grayscale, tuple returned contains only the number of rows and columns
print(img.shape) # (280, 450, 3)
print(img.size) # 378000
print(img.dtype) # uint8

b = img[:,:,0] # access only blue pixel values
img[:,:,2] = 0 # set red pixel values to 0

cv.imshow('image', img)
cv.waitKey(0)
cv.destroyAllWindows()

이런 것도 가능하고 blending도 할 수 있답니다 (img1 몇 퍼센트 + img2 몇 퍼센트 + a)

import numpy as np
import cv2 as cv

img = cv.imread('messi5.jpg')
assert img is not None, "file could not be read, check with os.path.exists()"

# 방법 1: 배율로 크기 조정
res = cv.resize(img, None, fx=2, fy=2, interpolation=cv.INTER_CUBIC)

# 방법 2: 크기를 직접 지정
height, width = img.shape[:2]
res = cv.resize(img, (2 * width, 2 * height), interpolation=cv.INTER_CUBIC)

이미지 확대 및 축소

이 외에도 이미지 translation, rotation, affine transformation,perspective transforation, smoothing, 등등이 가능한데 그냥 docs 보시죠

4. face detection

openCV library 의 Haar Cascade 모델 써봤는데 이상함

import cv2

face_classifier = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)

Dlib 라이브러리 사용

⇒ 적중률 너무 낮음

(pretrained frontal face detection model)

문제는 엄청 가벼운 모델이어서 face detection 제대로 안 되기도 하고
고개 돌리면서 측면, 후면 머리는 detect 못함

=> 차라리 yolov8 object detection 후 거기서 대충 머리 위치를 비율로 계산하는게 (rough한 demo 만들 땐?)

5. 이미지 퓨리에 변환

푸리에 변환(Fourier Transform)은 신호나 이미지를 주파수 영역으로 변환하여 주파수 특성을 분석할 수 있게 해줍니다.

이산 푸리에 변환(Discrete Fourier Transform, DFT)

이산 푸리에 변환(DFT)은 이산 신호를 주파수 영역으로 변환하는 방법입니다. => 빠른 알고리즘인 고속 푸리에 변환(FFT)이 있다고 함

이미지 처리에서 푸리에 변환의 필요성

이미지에서도 푸리에 변환을 사용하여 이미지의 주파수 특성을 분석할 수 있음

특히, 엣지나 노이즈와 같은 고주파 성분을 분석하거나 제거하는 데 유용함

엣지(Edge): 이미지의 픽셀 값이 급격히 변하는 부분으로, 고주파 성분에 해당
노이즈(Noise): 이미지에서 불필요한 작은 변동으로, 역시 고주파 성분에 해당

Numpy를 사용한 푸리에 변환

Numpy의 np.fft.fft2() 함수를 사용하여 이미지의 2D 푸리에 변환을 계산할 수 있음 (주파수 성분 분석 가능)

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)  # 중앙으로 이동
magnitude_spectrum = 20 * np.log(np.abs(fshift))

plt.subplot(121), plt.imshow(img, cmap='gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122), plt.imshow(magnitude_spectrum, cmap='gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

고주파 성분 제거

고주파 성분을 제거하면 이미지에서 엣지와 노이즈를 줄일 수 있음

주파수 영역에서 저주파 필터를 적용한 후, 역 푸리에 변환을 사용하여 다시 이미지로 변환 ㄱㄱ

rows, cols = img.shape
crow, ccol = rows // 2, cols // 2
fshift[crow-30:crow+31, ccol-30:ccol+31] = 0  # 고주파 성분 제거

f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)

plt.subplot(131), plt.imshow(img, cmap='gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132), plt.imshow(img_back, cmap='gray')
plt.title('Image after HPF'), plt.xticks([]), plt.yticks([])
plt.subplot(133), plt.imshow(img_back)
plt.title('Result in JET'), plt.xticks([]), plt.yticks([])
plt.show()

OpenCV를 사용한 푸리에 변환

OpenCV에서는 cv.dft()와 cv.idft() 함수를 사용하여 푸리에 변환과 역 푸리에 변환을 수행할 수 있음 (numpy 보다 빠릅니다)

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"

dft = cv.dft(np.float32(img), flags=cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)

magnitude_spectrum = 20 * np.log(cv.magnitude(dft_shift[:, :, 0], dft_shift[:, :, 1]))

plt.subplot(121), plt.imshow(img, cmap='gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122), plt.imshow(magnitude_spectrum, cmap='gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

저주파 필터 적용

저주파 필터를 적용하여 고주파 성분을 제거하고 이미지를 블러링할 수 있음

rows, cols = img.shape
crow, ccol = rows // 2, cols // 2

# 마스크 생성
mask = np.zeros((rows, cols, 2), np.uint8)
mask[crow-30:crow+30, ccol-30:ccol+30] = 1

# 마스크 적용 및 역 DFT
fshift = dft_shift * mask
f_ishift = np.fft.ifftshift(fshift)
img_back = cv.idft(f_ishift)
img_back = cv.magnitude(img_back[:, :, 0], img_back[:, :, 1])

plt.subplot(121), plt.imshow(img, cmap='gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122), plt.imshow(img_back, cmap='gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

6. 이미지 Hough 변환으로 직선 및 circle 탐지

Hough 변환 (Hough Transform)

Hough 변환은 이미지에서 선형 또는 곡선형의 모양을 감지하는데 사용되는 강력한 기법입니다. ⇒ 도로 차선 감지, 객체의 가장자리 감지 등 선형 패턴 탐지에 효과적임

Hough 변환의 작동 원리

이미지 준비: 먼저 이진 이미지(binary image)를 준비, 캐니 엣지 검출(Canny edge detection)을 사용하여 엣지를 추출
누적기(accumulator) 배열 초기화: ρ와 θ의 가능한 값을 포함하는 2D 배열을 초기화
각 점에 대해 누적기 업데이트: 이미지의 각 엣지 점에 대해 다양한 θ 값을 계산하고, 해당 ρ 값을 찾고, 해당 ρρ와 θθ 값에 대해 누적기 배열의 셀을 증가시킴
누적기에서 최대값 찾기: 가장 많은 투표를 받은 ρ와 θ값을 찾고, 이는 이미지에서 선을 나타냄

Hough 변환 구현

OpenCV에서는 cv.HoughLines() 함수를 사용하여 Hough 변환을 쉽게 구현할 수 있음

import cv2 as cv
import numpy as np

# 이미지 읽기
img = cv.imread(cv.samples.findFile('sudoku.png'))
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
edges = cv.Canny(gray, 50, 150, apertureSize=3)

# Hough 변환 적용
lines = cv.HoughLines(edges, 1, np.pi/180, 200)

# 검출된 선 그리기
for line in lines:
    rho, theta = line[0]
    a = np.cos(theta)
    b = np.sin(theta)
    x0 = a * rho
    y0 = b * rho
    x1 = int(x0 + 1000 * (-b))
    y1 = int(y0 + 1000 * (a))
    x2 = int(x0 - 1000 * (-b))
    y2 = int(y0 - 1000 * (a))
    cv.line(img, (x1, y1), (x2, y2), (0, 0, 255), 2)

cv.imwrite('houghlines3.jpg', img)

circle 탐지는 cv.HoughCircles() 활용해서 ㄱㄱ

https://docs.opencv.org/4.x/da/d53/tutorial_py_houghcircles.html

사실 저는 캘리브레이션 + 이미지 위에 bbox/vector 표시 용도로 제일 많이 쓰긴했는데 기능들이 엄청 많더라고요

'AI > vision' 카테고리의 다른 글

[논문 간단 리뷰] You Only Look Once: Unified, Real-Time Object Detection (0)	2025.01.02
[논문 리뷰] Reducing Hallucinations in Vision-Language Models via Latent Space Steering (0)	2024.10.31
Vision Encoder - SIGLIP (1)	2024.10.30
camera calibration using OpenCV (0)	2024.08.29
영상 처리를 위한 OpenCV 라이브러리 (0)	2024.08.29

현재글OpenCV 이모저모

대충공부한거적어두는블로그 (2023.02 ~ ) 해킹 공부 기록용으로 시작했다가 잡다한 거 다올리는 공부 메모장 느낌으로 봐주세요😺

(2023.02 ~ ) 해킹 공부 기록용으로 시작했다가 잡다한 거 다올리는 공부 메모장 느낌으로 봐주세요😺

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

대충공부한거적어두는블로그