Generalized Intersection over Union

文章推薦指數: 80 %
投票人數:10人

Generalized Intersection over Union. A Metric and A Loss for Bounding Box Regression. Cite Paper Explore. Object Detection and ... HomeMethodPaperGitHubTeam GeneralizedIntersectionoverUnion AMetricandALossforBoundingBoxRegression CitePaperExploreObjectDetectionand$IoU$ IntersectionoverUnion(IoU),alsoknownastheJaccardindex,isthemostpopularevaluationmetricfortaskssuchassegmentation,objectdetectionandtracking.Objectdetectionconsistsoftwosub-tasks:localization,whichisdeterminingthelocationofanobjectinanimage,andclassification,whichisassigningaclasstothatobject. Thegoaloflocalizationinobjectdetectionistodrawa2Dboundingboxaroundtheobjectsinthescene.Furthersimplifiedinthisexample,wefocusonasingleboundingbox.ClickthisbuttontoLoadtheGroundTruthBoundingBox. Thegroundtruthboundingboxshouldnowbeshownintheimageabove.Thesourceforthisimageandboundingboxisthecocodataset.Weknowthisisthegroundtruthbecauseapersonmanuallyannotatedtheimage.Now,clicktheLoadPredictionbuttontoshowapredictionthatmightbemade.Thispredictionboundingboxisusuallytheoutputofaneuralnetwork,eitherduringtrainingoratinferencetime. ExplorewhattheIntersectionArea$I$ortheUnionArea$U$looklike.IntersectionoverUnion($IoU$)isthencomputedasfollows: $$IoU=\frac{|A\capB|}{|A\cupB|}=\frac{|I|}{|U|}$$ WhereAandBarethepredictionandgroundtruthboundingboxes. $IoU$hastheappealingpropertyofscaleinvariance.Thismeansthatthewidth,heightandlocationofthetwoboundingboxesunderconsiderationaretakenintoaccount.ThenormalizedIoUmeasurefocusesontheareaoftheshapes,nomattertheirsize. CommonCostFunctions Objectdetectionneuralnetworkscommonlyuse$\ell_1$-normor$\ell_2$-normfortheircostfunction(aka.lossfunction).OurworkshowsthatthereisnotastrongcorrelationbetweenminimizingthesecommonlyusedlossesandimprovingtheirIoUvalue. Tounderstandwhythisisthecase,recallthatarectanglecanberepresentedparametricallyinavarietyofways.Forexample,boundingboxescanberepresentedbytheirtop-leftcorner$(x_1,y_1)$andtheirbottom-rightcorner$(x_2,y_2)$,whichcanbewrittenas$(x_1,y_1,x_2,y_2)$. Alternatively,the$(x_c,y_c)$coordinatesforthecenteroftheboundingboxcanbeusedinconjunctionwiththeboundingbox'swidth$w$andheight$h$giving$(x_c,y_c,w,h)$. Ifwecalculate$\ell_2$-normdistance,$||.||_2$fortheboundingboxesinbothcasesshownaboveandwecalculate$\ell_1$-normdistance,$||.||_1$.Noticehowthe$\ell_n$-normvaluesareexactlythesame,buttheir$IoU$and$GIoU$valuesareverydifferent. Itiscommonpracticetotrainanetworkbyoptimizingalossfunctionsuchas$\ell_1$-normor$\ell_2$-norm,butthenevaluateperformanceonadifferentfunction,suchas$IoU$.Moreover,$\ell_n$-normbasedlossesarenotscaleinvariant.Therefore,boundingboxeswiththesamelevelofoverlap,butdifferentscaleswillgivedifferentvalues.Stateoftheartobjectdetectionnetworksdealwiththisproblembyintroducingideassuchasanchorboxesandnon-linearrepresentations,butevenwiththeseengineeredtweaks,thereisstillagapbetwenthe$\ell_n$-normcostfunctionandthe$IoU$metric. $IoU$vs.$GIoU$asaMetric Inobjectdetection,$IoU$isusedasametrictoevaluatehowclosethepredictionboundingboxistothegroundtruth.Inthefirstexampleabove,thepredictionandgroundtruthboundingboxesoverlap,sothevaluefor$IoU$isnon-zero.Let'slookatanexamplewhere$IoU$fallsshort. First,ShowtheGroundTruthBoundingBox.Now,saythatinsteadofmakingapredictionlikewesawabove,whatifweMakeaBadPredictionwherethepredictedboundingboxhasnooverlapwiththegroundtruth.Inthiscase,andanyothercasewherethereisnooverlapbetweenthegroundtruthandpredictionboundingboxes,intersectionis0,therefore$IoU$willbe0aswell. Nowlet'sMakeaBetterPrediction.Unfortunately,$IoU$isstill0forboth.Itwouldbeniceif$IoU$indicatedifournew,betterpredictionwasclosertothegroundtruththanthefirstprediction,evenincasesofnointersection. Ourworkproposesasolutiontothis,$GIoU$,whichisformulatedasfollows: $$GIoU=\frac{|A\capB|}{|A\cupB|}-\frac{|C\backslash(A\cupB)|}{|C|}=IoU-\frac{|C\backslash(A\cupB)|}{|C|}$$ Where$A$and$B$arethepredictionandgroundtruthboundingboxes.$C$isthesmallestconvexhullthatenclosesboth$A$and$B$.Usethebuttonsbelowtovisualizethevalueof$C$forbothofthepredictionsinthisexample. Noticethattheareaof$C$issmallerinthebettercaseandallothervaluesremainconstant.$IoU$willbe0inbothcases.Therefore,asmallervalueissubtractedandthevalueof$GIoU$increasesasthepredictionmovestowardsthegroundtruth. BadPredictionGIoUAreaC vs. BetterPredictionGIoUAreaC$GIoU$asaloss Recallthatinaneuralnetwork,anygivenlossfunctionmustbedifferentiabletoallowforbackpropagation.Weseefromtheaboveexamplethatincaseswherethereisnointersection,$IoU$hasnovalueandthereforenogradient.$GIoU$however,isalwaysdifferentiable. Wesampledcaseswherethepredictionboundingboxoverlaps(aka.intersects)thegroundtruthandcaseswherethereisnointersection.Therelationshipbetween$IoU$and$GIoU$forthesesamplesisshowninthisfigure. Fromtheplot,asfromtheformulationabove,youcanseethat$GIoU$rangesfrom-1to1.Negativevaluesoccurwhentheareaenclosingbothboundingboxes,e.g.$C$,isgreaterthan$IoU$.Asthe$IoU$componentincreases,thevalueof$GIoU$convergesto$IoU$. $GIoU$AlgorithmPseudocodeAlgorithm:$IoU$and$GIoU$asboundingboxlosses $input$:Predicted$B^p$andgroundtruth$B^g$boundingboxcoordinates: $B^p=(x^p_1,y^p_1,x^p_2,y^p_2)$,$B^g=(x^g_1,y^g_1,x^g_2,y^g_2)$ $output$:$\mathcal{L}_{IoU}$,$\mathcal{L}_{GIoU}$ 1.Forthepredictedbox$B^p$,ensuring$x^p_2>x^p_1$and$y^p_2>y^p_1$: $\hat{x}^p_1=\min(x^p_1,x^p_2)$,$\hat{x}^p_2=\max(x^p_1,x^p_2)$, $\hat{y}^p_1=\min(y^p_1,y^p_2)$,$\hat{y}^p_2=\max(y^p_1,y^p_2)$ 2.Calculatingareaof$B^g$:$A^g=(x^g_2-x^g_1)\times(y^g_2-y^g_1)$ 3.Calculatingareaof$B^p$:$A^p=(\hat{x}^p_2-\hat{x}^p_1)\times(\hat{y}^p_2-\hat{y}^p_1)$ 4.Calculatingintersection$\mathcal{I}$between$B^p$and$B^g$: $x^{\mathcal{I}}_1=\max(\hat{x}^p_1,x^g_1)$,$x^{\mathcal{I}}_2=\min(\hat{x}^p_2,x^g_2)$, $y^{\mathcal{I}}_1=\max(\hat{y}^p_1,y^g_1)$,$y^{\mathcal{I}}_2=\min(\hat{y}^p_2,y^g_2)$, $\mathcal{I}=\begin{cases} (x^{\mathcal{I}}_2-x^{\mathcal{I}}_1)\times(y^{\mathcal{I}}_2-y^{\mathcal{I}}_1)&\text{if}\quadx^{\mathcal{I}}_2>x^{\mathcal{I}}_1,y^{\mathcal{I}}_2>y^{\mathcal{I}}_1,\\ 0&\text{otherwise} \end{cases}$ 5.Findingthecoordinateofsmallestenclosingbox$B^c$: $x^{c}_1=\min(\hat{x}^p_1,x^g_1)$,$x^{c}_2=\max(\hat{x}^p_2,x^g_2)$, $y^{c}_1=\min(\hat{y}^p_1,y^g_1)$,$y^{c}_2=\max(\hat{y}^p_2,y^g_2)$ 6.Calculatingareaof$B^c$:$A^c=(x^c_2-x^c_1)\times(y^c_2-y^c_1)$ 7.$\displaystyleIoU=\frac{\mathcal{I}}{\mathcal{U}}$,where$\mathcal{U}=A^p+A^g-\mathcal{I}$ 8.$\displaystyleGIoU=IoU-\frac{A^c-\mathcal{U}}{A^c}$ 9.$\mathcal{L}_{IoU}=1-IoU$,$\mathcal{L}_{GIoU}=1-GIoU$TeamHamidRezatofighiNathanTsoi Website JunYoungGwakAmirSadeghianIanReid Website SilvioSavarese Website Ifyoufoundthisworkhelpfulinyourresearch,pleaseconsiderciting: @article{Rezatofighi_2018_CVPR, author={Rezatofighi,HamidandTsoi,NathanandGwak,JunYoungandSadeghian,AmirandReid,IanandSavarese,Silvio}, title={GeneralizedIntersectionoverUnion}, booktitle={TheIEEEConferenceonComputerVisionandPatternRecognition(CVPR)}, month={June}, year={2019}, }



請為這篇文章評分?