'AI/vision' 카테고리의 글 목록

[논문 리뷰] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

https://github.com/salesforce/LAVIS/tree/main/lavis/models/blip2_models LAVIS/lavis/models/blip2_models at main · salesforce/LAVISLAVIS - A One-stop Library for Language-Vision Intelligence - salesforce/LAVISgithub.com Introduction기존 Vision-language pretraining (VLP) 연구의 한계 => end-to-end fashion으로 큰 규모의 모델과 데이터셋을 학습하려다보니 computational cost가 무척 큼일반적이고 compute-efficient한 VLP method 제시, pre-trained..

AI/vision 2025.01.06

[논문 리뷰] Flamingo: a Visual Language Model for Few-Shot Learning

IntroductionVLM의 시초라고도 할 수 있는 모델. 아래와 같은 achievement를 달성하였다pretrained vision-only model과 language-only model을 효과적으로 연결(bridge)했다visual + textual data가 랜덤하게 interleave된 시퀀스도 처리 가능하다 => large scale web data를 긁어올 수 있었음이미지/동영상 모두 원활하게 처리 가능하다in-context few-shot learning capability를 통해 별도의 fine-tune 없이도 여러 vision&language task에서 SOTA를 달성했다기존 computer vision 분야에서의 국룰은 large supervised data로 pretrain →..

AI/vision 2025.01.03

[논문 리뷰] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

https://github.com/google-research/vision_transformer GitHub - google-research/vision_transformerContribute to google-research/vision_transformer development by creating an account on GitHub.github.com IntroductionTransformer는 그 computational efficiency와 scalability 덕분에 NLP 분야에서의 정론이 되었음model과 dataset이 계속 커지고 있음에도 performance saturation이 (아직) 없음이 논문에서는 CNN이 지배중인 computer vision 분야에 Transformer를 ..

AI/vision 2025.01.03

[논문 간단 리뷰] You Only Look Once: Unified, Real-Time Object Detection

Faster R-CNNtwo stage 방식 (당연히 real-time은 어려움..)이미지 전체에 대해 backbone NW + region proposal NW 쭉 통과시키고각 region에 대해 cropping (rol-pool, rol-align) + class prediction + bbox translate prediction YOLO Introduction기존 모델들의 한계DPM(Deformable Parts Model): sliding window approach + 각각에 대해 classifier 돌아감 (비효율적)R-CNN: generate potential bbox(region proposal) + then run classifier, 느리고 optimize하기 어렵다single con..

AI/vision 2025.01.02

[논문 리뷰] Reducing Hallucinations in Vision-Language Models via Latent Space Steering

https://arxiv.org/abs/2410.15778https://github.com/shengliu66/VTI GitHub - shengliu66/VTI: Code for Reducing Hallucinations in Vision-Language Models via Latent Space SteeringCode for Reducing Hallucinations in Vision-Language Models via Latent Space Steering - shengliu66/VTIgithub.com깃헙 레포는 글 작성일인 오늘 업데이트된 따끈따끈한 논문..인데 어쩌다보니 읽어봄 Abstraction + IntroductionLVLM의 환각은 시각적 입력과 텍스트 출력 간의 불일치로 인해 발생하는..

AI/vision 2024.10.31

Vision Encoder - SIGLIP

LLaVA OneVision을 fine-tune 해보고 있는데, LVLM의 mm_vision_tower (vision encoder)로 siglip을 사용중이어서 이참에 정리 겸 공부해볼까 싶어 남긴다 이미지에서 시각 정보를 vision feature vector로 추출하고 그것을 text input과 동일한 embedding 공간에 투영하는 과정을 거쳐 text decoder (여기선 LLM이겠죠)의 인풋으로 들어간다.. encoder 모델CLIP (Contrastive Language-Image Pre-training)* CLIP은 이미지와 텍스트를 같이 학습하여 두 가지의 상호 연관성을 강화하는 contrastive learning 기법을 사용* 이미지 인코더(ViT)와 텍스트 인코더(BERT)를 ..

AI/vision 2024.10.30

camera calibration using OpenCV

2D image points are OK which we can easily find from the image. (These image points are locations where two black squares touch each other in chess boards) we need to know 3D points(X,Y,Z) values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always) and camera was moved accordingly. ⇒ Now for X,Y values, we can simply pass points as (0,0), (1,0), (2,0) … (s..

AI/vision 2024.08.29

OpenCV 이모저모

1. 이미지 열기cv.imread : 첫 인자는 파일경로IMREAD_COLOR loads the image in the BGR 8-bit format. This is the default that is used here.IMREAD_UNCHANGED loads the image as is (including the alpha channel if present)IMREAD_GRAYSCALE loads the image as an intensity onecv.imshow : 화면에 뵈기cv.waitKey(0) : wait for user input in ms (0 ⇒ forever)cv.imwrite : image is written to a file path 2. 비디오 캡쳐해서 보여주기 및 저장하기 (얘..

AI/vision 2024.08.29

영상 처리를 위한 OpenCV 라이브러리

https://github.com/opencv/opencv GitHub - opencv/opencv: Open Source Computer Vision LibraryOpen Source Computer Vision Library. Contribute to opencv/opencv development by creating an account on GitHub.github.comhttps://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html OpenCV: OpenCV-Python TutorialsCore Operations In this section you will learn basic operations on image like pixel editing, geome..

AI/vision 2024.08.29

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

대충공부한거적어두는블로그

AI/vision 9

티스토리툴바