設計一個Tracking追蹤CNN模型 - CH.Tseng

2024-11-17

文章推薦指數： 80 %

投票人數：10人

... 基本上該addon就是輸入實際及預測的兩組bbox，便可得到GIOU loss：. import tensorflow_addons as tfa. model.compile( opt, loss=tfa.losses. 直接觀看文章在訓練Siamesenetwork的時候，心中突然有個想法，既然我們能把兩張圖片輸入模型取得兩者的差異度，那麼，是否也能輸入兩張圖片，然後讓模型告訴我們第二張圖片的影像出現在第一圖片中的位置，這模型的功能非常類型小朋友的遊戲：威利在那裏？如果把它應用在影片中，也能作為物件追蹤使用。

傳統上在執行物件追蹤時，我們必須先在啟始影像取得要追蹤對像的位置及區域，然後逐frame偵測並比對所有物件的距離，透過計算來預測追蹤該物件。

我的想法是，如果能直接訓練出一個CNN模型，只需要輸入任何既有物件的圖片，模型便持續在影格中比對並輸出該物件的位置及區域，不需要額外的前後計算處理，等於是將追蹤的任務交由模型一併完成。

模型的簡要概念如上方標題圖片，輸入一張圖片以及一個物件圖片，輸入模型後，便會得到該物件在圖片中的BBOX（位置及區域）。

模型設計模型主要可分成三大結構：目標物件的Featureextractor、影格物件的Featureextractor以及檢測boundingbox的BBOXHeader。

目標物件Featureextractor 目標物件的圖片尺寸較小，featureextractor也比較簡單，由基本的CNNlayersCONV=>RELU=>POOL所組成，最後一樣使用GAPlayer輸出指定數目的featureparameters，目前暫設為32，輸入圖片尺寸則為32×32。

defobjbase(inputShape,embeddingDim=32):     inputs=Input(inputShape)     x=Conv2D(128,(2,2),padding="same",activation="relu")(inputs)     x=MaxPooling2D(pool_size=(2,2))(x)     x=Conv2D(64,(2,2),padding="same",activation="relu")(x)     x=MaxPooling2D(pool_size=2)(x)     pooledOutput=GlobalAveragePooling2D()(x)     outputs=Dense(embeddingDim)(pooledOutput)     model=Model(inputs,outputs)     returnmodel 影格物件Featureextractor Base網路為DenseNet121，其後接GAPLayer（globalspatialaveragepoolinglayer），輸出指定數量的featuresparameters，目前暫設為128，輸入圖片尺寸則為416×416。

defimgbase(embeddingDim=128):     base_model=DenseNet121(weights=’imagenet’,include_top=False)     x=base_model.output     x=GlobalAveragePooling2D()(x)     out=Dense(embeddingDim,activation=’relu’)(x)     model=Model(inputs=base_model.input,outputs=out)     returnmodel 檢測BBOXHeader 此部份主要功能為接收並合併來自於1)目標物件2)影格兩種Featuresmaps後，進行迴歸輸出bbox（四個值：left-topX,left-topY,right-bottomX,right-bottomY）。

concat_layer=Concatenate()([featsA,featsB]) bboxHead=Dense(64,activation="relu")(concat_layer) bboxHead=Dense(32,activation="relu")(bboxHead) bboxHead=Dense(4,activation="sigmoid",name="bounding_box")(bboxHead) 最後，將上述三個架構合併起來，藍色字體為上述的BBOXHader： imgA=Input(shape=config.IMG_SHAPE) imgB=Input(shape=config.OBJ_SHAPE) img_base=imgbase(embeddingDim=128) obj_base=objbase(inputShape=config.OBJ_SHAPE,embeddingDim=32) featsA=img_base(imgA) featsB=obj_base(imgB) concat_layer=Concatenate()([featsA,featsB]) bboxHead=Dense(64,activation="relu")(concat_layer) bboxHead=Dense(32,activation="relu")(bboxHead) bboxHead=Dense(4,activation="sigmoid",name="bounding_box")(bboxHead) model=Model(inputs=[imgA,imgB],outputs=bboxHead) model.summary() 參數量為7,217,796，模型size並不會很大。

Dataset格式需求由於我們的模型一次需要輸入兩張圖片，因此，需要準備成對的圖片來餵入模型。

成對的圖片，尺寸分別為32×32以及416×416。

416×416圖片需標記，每張最多一個標記（使用LabelImg工具的XML格式）。

32×32的圖片，與416×416圖中的標記為同類型物件，但不需要完全相同。

LaSOT(Large-scaleSingleObjectTracking) 要準備上述成對的相片以並進行標記，是一件非常辛苦的事，我推薦可以下載LaSOT這套dataset。

該dataset顧名思義就是針對trainingdata-hungrydeeptrackers為目的，其特性如下： 1,550個短片以及超過3百87萬個影格圖片每張影格皆有人工標記及檢查。

超過70種類別，每種類別包含至少15類型的短片及圖片。

每個短片至少83秒，約2,500張影格。

標記類型:每張影格皆有矩形標記，以及圖片說明標記。

下載LaSOTdataset 網址：http://vision.cs.stonybrook.edu/~lasot/ converence版本(70categorieswith1,400videos,~227G) newsubsetinextendedjournal版本(15categorieswith150videos,~59G) 全部下載總計近300GB，資料量相當大。

將LaSOT轉為VOCXML標記下載後並解壓，會看到LaSOT資料夾結構如下：每個影片資料夾的目錄如下：所有影格圖片皆置於imgfolder，數量不定，從1000多到5000多張都有，把它們依檔名串起來就可還原成影片。

Bbox標記檔位groundtruth.txt，每張圖片一行，依圖片檔名依次排序。

366,103,45,16 364,107,45,15 362,109,46,16 362,111,46,18 使用註一convert_LaSOT_2_voc.py，可指定將LaSOT中某個class目錄轉換為VOC標記格式。

LaSOT光一個class的圖片標記數量就很龐大了，例如，airplane這個class就包含了20個短片，總計85,921張標記的frame圖片，3.75GB。

訓練ObjectTrackingCNNModel Loss的使用可以使用自定義的IOULoss，還記得之前我們用Keras設計一個multiclass的Objectdetectormodel嗎？所自定義的IOUloss方式如下(注意，下方請自行去除classheader的部份，本次範例並沒有用到classclassification)：兩組Header的最後一層layer，其名稱分別為bounding_box及class_label 定義計算及衡量IOU的function defiou_metric(y_true,y_pred):   returncalculate_iou(y_true,y_pred) defcalculate_iou(target,pred):   xA=K.maximum(target[:,0],pred[:,0])   yA=K.maximum(target[:,1],pred[:,1])   xB=K.minimum(target[:,2],pred[:,2])   yB=K.minimum(target[:,3],pred[:,3])   interArea=K.maximum(0.0,xB-xA)*K.maximum(0.0,yB-yA)   boxAarea=(target[:,2]-target[:,0])*(target[:,3]-target[:,1])   boxBarea=(pred[:,2]-pred[:,0])*(pred[:,3]-pred[:,1])   iou=interArea/(boxAarea+boxBarea–interArea)   returniou defcustom_loss(y_true,y_pred):   mse=tf.losses.mean_squared_error(y_true,y_pred)   iou=calculate_iou(y_true,y_pred)   returnmse+(1-iou) 計算loss及衡量的對應 losses ={       “class_label“:“categorical_crossentropy",       “bounding_box“:custom_loss, } lossWeights ={     “class_label“:1.0,     “bounding_box“:1.0 }      lossMetrics ={     “class_label“:“accuracy",     “bounding_box“:iou_metric } trainTargets={     “class_label“:trainLabels,     “bounding_box“:trainBBoxes } testTargets={     “class_label“:testLabels,     “bounding_box“:testBBoxes    } 使用於model.compile及model.fit opt=Adam(lr=config.INIT_LR) model.compile(optimizer=opt,loss=losses,metrics=lossMetrics,loss_weights=lossWeights) H=model.fit(     trainImages,trainTargets,     validation_data=(testImages,testTargets),     batch_size=config.BATCH_SIZE,     epochs=config.NUM_EPOCHS,     verbose=1) 使用Tensorflow的Addons 如果不要這麼麻煩，也可以使用TensorflowAddons的GIOU，使用上相當簡單，基本上該addon就是輸入實際及預測的兩組bbox，便可得到GIOUloss： importtensorflow_addonsastfa model.compile(opt,loss=tfa.losses.GIoULoss()) 使用FitGenerator 如果我們想要訓練的dataset非常巨大，高達數萬甚至於數百萬張，那麼，就無法一次將dataset載入而必須by批次分別讀取dataset進行訓練，在舊版Tensorflow是使用fit_generator，新版已經改為統一用fit指令。

要持續的批次讀取dataset到結束，秘訣在於yield的使用。

defload_ds_train(batch_size): whileTrue:         whilebatch_start’,data)             bbox_objects[label_name]=bboxes             makeLabelFile(id,file_basename,bbox_objects,img_path) Sharethis:TwitterFacebook請按讚：喜歡正在載入... 文章分頁導航上一個使用CYCLEGAN模擬手飾配戴VR下一步設計並訓練可同時預測性別年齡CNN模型發表者：chtseng 檢視「chtseng」的全部文章 Flickr相片更多相片追蹤已追蹤 CH.Tseng 加入其他110位關注者我要註冊已經有WordPress.com帳號了？立即登入。

CH.Tseng 自訂追蹤已追蹤註冊登入複製短網址回報此內容以閱讀器檢視管理訂閱收合此列 %d位部落客按了讚：

請為這篇文章評分？

延伸文章資訊

Delphi Complete Works of Euripides (Illustrated) - Google 圖書結果

... XoAKóTU).óv te 98&v, ôüoyanov dioxog É.óv [1115|TFA).óðu to playá). ... Ekößn, veðg uèv stitu...

GIoULoss doesn't go below 1 · Issue #982 · tensorflow/addons

... prediction problem # where we can use GIoU as our loss function. x1 = int(row)-int(rad) y1 = ...

tensorflow_addons.losses.GIoULoss Example - Program Talk

... query_idx) # Calculate L1 Loss l1_loss = tf.reduce_sum(tf.math.abs(ordered_target - ordered_o...

設計一個Tracking追蹤CNN模型 - CH.Tseng

... 基本上該addon就是輸入實際及預測的兩組bbox，便可得到GIOU loss：. import tensorflow_addons as tfa. model.compile( opt...

Understanding the GIoU loss function in tensorflow

設計一個Tracking追蹤CNN模型 - CH.Tseng

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

駐顏有術品木

POS 餐飲系統

vs media面試