machine-learning-articles/using-huber-loss-in-keras.md - GitHub

文章推薦指數: 80 %
投票人數:10人

Summary and code example: Huber Loss with TensorFlow 2 and Keras ... Loss functions are used to compare predictions with ground truth values after ... Skiptocontent {{message}} christianversloot / machine-learning-articles Public Notifications Fork 335 Star 1.2k Code Pullrequests 0 Actions Security Insights More Code Pullrequests Actions Security Insights Permalink main Branches Tags Couldnotloadbranches Nothingtoshow {{refName}} default Couldnotloadtags Nothingtoshow {{refName}} default Atagalreadyexistswiththeprovidedbranchname.ManyGitcommandsacceptbothtagandbranchnames,socreatingthisbranchmaycauseunexpectedbehavior.Areyousureyouwanttocreatethisbranch? machine-learning-articles/using-huber-loss-in-keras.md Gotofile Gotofile T Gotoline L Copypath Copypermalink Thiscommitdoesnotbelongtoanybranchonthisrepository,andmaybelongtoaforkoutsideoftherepository.     Cannotretrievecontributorsatthistime Summaryandcodeexample:HuberLosswithTensorFlow2andKeras AboutlossfunctionsandHuberloss HuberlossexamplewithTensorFlow2/Keras Regressiondataset:Bostonhousingpriceregression Doesthedatasethavemanyoutliers? Creatingthemodel Whatyou'llneedtouseHuberlossinKeras Modelimports Loadingthedataset Preparingthemodel:architecture&configuration Performancetesting&visualization Modelperformancefor𝛿=1.5 Recap References 313lines(205sloc) 18.7KB Raw Blame Editthisfile E OpeninGitHubDesktop OpenwithDesktop Viewraw Viewblame title date categories tags UsingHuberlosswithTensorFlow2andKeras 2019-10-12 buffer frameworks deep-learning huber-loss keras loss-function machine-learning neural-networks regression TheHuberlossfunctioncanbeusedtobalancebetweentheMeanAbsoluteError,orMAE,andtheMeanSquaredError,MSE.Itisthereforeagoodlossfunctionforwhenyouhavevarieddataoronlyafewoutliers. ButhowtoimplementthislossfunctioninKeras? That'swhatwewillfindoutinthisblog. WefirstbrieflyrecaptheconceptofalossfunctionandintroduceHuberloss.Next,wepresentaKerasexampleimplementationthatusestheBostonHousingPricesDatasettogeneratearegressionmodel. Afterreadingthistutorial,youwillhavelearned... Whatlossfunctionsareinneuralnetworks. HowHuberlossworksandhowitcombinesMAEandMSE. Howtensorflow.keras.losses.HubercanbeusedwithinyourTensorFlow2/Kerasmodel. Let'sgettowork!🚀 NotethatthefullcodeisalsoavailableonGitHub,inmyKeraslossfunctionsrepository. Update28/Jan/2021:updatedthetutorialtoensurethatitisreadyfor2021.ThecodenowrunswithTensorFlow2basedversionsandhasbeenupdatedtousetensorflow.keras.losses.HuberinsteadofacustomHuberlossfunction.Alsoupdatedheaderinformationandfeaturedimage. [toc] Summaryandcodeexample:HuberLosswithTensorFlow2andKeras Lossfunctionsareusedtocomparepredictionswithgroundtruthvaluesaftertheforwardpasswhentraininganeuralnetwork.Therearemanylossfunctions,andchoosingonecanbedependentonthedatasetthatyouaretrainingwith.Forexample,inregressionproblems,youwanttouseMeanAbsoluteErrorifyouhavemanyoutliers,whileifyoudon'tMeanSquaredErrorcanbeabetterchoice. Butsometimes,youdon'tknowexactlywhichofthesetwoisbest.Inthatcase,Huberlosscanbeofhelp.Basedonadeltaparameter,itshapesitselfasalossfunctionsomewhereinbetweenMAEandMSE.Thisway,youhavemorecontroloveryourneuralnetwork. InTensorFlow2andKeras,Huberlosscanbeaddedtothecompilestepofyourmodel-i.e.,tomodel.compile.Here,you'llseeanexampleofHuberlosswithTF2andKeras.Ifyouwanttounderstandthelossfunctioninmoredetail,makesuretoreadtherestofthistutorialaswell! model.compile(loss=tensorflow.keras.losses.Huber(delta=1.5),optimizer='adam',metrics=['mean_absolute_error']) AboutlossfunctionsandHuberloss Whenyoutrainmachinelearningmodels,youfeeddatatothenetwork,generatepredictions,comparethemwiththeactualvalues(thetargets)andthencomputewhatisknownasaloss.Thislossessentiallytellsyousomethingabouttheperformanceofthenetwork:thehigheritis,theworseyournetworksperformsoverall. Therearemanywaysforcomputingthelossvalue.Huberlossisoneofthem.ItessentiallycombinestheMeanAbsoluteErrorandtheMeanSquaredErrordependingonsomedeltaparameter,or𝛿.Thisparametermustbeconfiguredbythemachinelearningengineerupfrontandisdependentonyourdata. Huberlosslookslikethis: Asyoucansee,fortarget=0,thelossincreaseswhentheerrorincreases.However,thespeedwithwhichitincreasesdependsonthis𝛿value.Infact,Grover(2019)writesaboutthisasfollows:HuberlossapproachesMAEwhen𝛿~0andMSEwhen𝛿~∞(largenumbers.) WhenyoucomparethisstatementwiththebenefitsanddisbenefitsofboththeMAEandtheMSE,you'llgainsomeinsightsabouthowtoadaptthisdeltaparameter: Ifyourdatasetcontainslargeoutliers,it'slikelythatyourmodelwillnotbeabletopredictthemcorrectlyatonce.Infact,itmighttakequitesometimeforittorecognizethese,ifitcandosoatall.Thisresultsinlargeerrorsbetweenpredictedvaluesandactualtargets,becausethey'reoutliers.SinceMSEsquareserrors,largeoutlierswilldistortyourlossvaluesignificantly.Ifoutliersarepresent,youlikelydon'twanttouseMSE.Huberlosswillstillbeuseful,butyou'llhavetousesmallvaluesfor𝛿. Ifitdoesnotcontainmanyoutliers,it'slikelythatitwillgeneratequiteaccuratepredictionsfromthestart-oratleast,fromsomeepochsafterstartingthetrainingprocess.Inthiscase,youmayobservethattheerrorsareverysmalloverall.Then,onecanargue,itmaybeworthwhiletoletthelargestsmallerrorscontributemoresignificantlytotheerrorthanthesmallerones.Inthiscase,MSEisactuallyuseful;hence,withHuberloss,you'lllikelywanttousequitelargevaluesfor𝛿. Ifyoudon'tknow,youcanalwaysstartsomewhereinbetween-forexample,intheplotabove,𝛿=1representedMAEquiteaccurately,while𝛿=3tendstogotowardsMSEalready.Whatifyouused𝛿=1.5instead?Youmaybenefitfrombothworlds. Let'snowseeifwecancompletearegressionproblemwithHuberloss! HuberlossexamplewithTensorFlow2/Keras Next,weshowyouhowtouseHuberlosswithKerastocreatearegressionmodel.We'llusetheBostonhousingpriceregressiondatasetwhichcomeswithKerasbydefault-that'llmaketheexampleeasiertofollow.Obviously,youcanalwaysuseyourowndatainstead! Sinceweneedtoknowhowtoconfigure𝛿,wemustinspectthedataatfirst.Dothetargetvaluescontainmanyoutliers?Somestatisticalanalysiswouldbeusefulhere. Onlythen,wecreatethemodelandconfigure𝛿toanestimatethatseemsadequate.Finally,werunthemodel,checkperformance,andseewhetherwecanimprove𝛿anyfurther. Regressiondataset:Bostonhousingpriceregression Kerascomeswithdatasetsonboardtheframework:theyhavethemstoredonsomeAmazonAWSserverandwhenyouloadthedata,theyautomaticallydownloaditforyouandstoreitinuser-definedvariables.Itallowsyoutoexperimentwithdeeplearningandtheframeworkeasily.Thisway,youcangetafeelforDLpracticeandneuralnetworkswithoutgettinglostinthecomplexityofloading,preprocessingandstructuringyourdata. TheBostonhousingpriceregressiondatasetisoneofthesedatasets.ItistakenbyKerasfromtheCarnegieMellonUniversityStatLiblibrarythatcontainsmanydatasetsfortrainingMLmodels.Itisdescribedasfollows: TheBostonhouse-pricedataofHarrison,D.andRubinfeld,D.L.'Hedonicpricesandthedemandforcleanair',J.Environ.Economics&Management,vol.5,81-102,1978.UsedinBelsley,Kuh&Welsch,'Regressiondiagnostics...',Wiley,1980. StatLibDatasetsArchive Andcontainsthesevariables,accordingtotheStatLibwebsite: CRIMpercapitacrimeratebytown ZNproportionofresidentiallandzonedforlotsover25,000sq.ft. INDUSproportionofnon-retailbusinessacrespertown CHASCharlesRiverdummyvariable(=1iftractboundsriver;0otherwise) NOXnitricoxidesconcentration(partsper10million) RMaveragenumberofroomsperdwelling AGEproportionofowner-occupiedunitsbuiltpriorto1940 DISweighteddistancestofiveBostonemploymentcentres RADindexofaccessibilitytoradialhighways TAXfull-valueproperty-taxrateper$10,000 PTRATIOpupil-teacherratiobytown B1000(Bk-0.63)^2whereBkistheproportionofblacksbytown LSTAT%lowerstatusofthepopulation MEDVMedianvalueofowner-occupiedhomesin$1000's Intotal,onesamplecontains13features(CRIMtoLSTAT)whichtogetherapproximatethemedianvalueoftheowner-occupiedhomesorMEDV.Thestructureofthisdataset,mappingsomevariablestoareal-valuednumber,allowsustoperformregression. Let'snowtakealookatthedatasetitself,andparticularlyitstargetvalues. Doesthedatasethavemanyoutliers? Thenumberofoutliershelpsustellsomethingaboutthevaluefordthatwehavetochoose.WhenthinkingbacktomyIntroductiontoStatisticsclassatuniversity,Irememberthatboxplotscanhelpvisuallyidentifyoutliersinastatisticalsample: Examinationofthedataforunusualobservationsthatarefarremovedfromthemassofdata.Thesepointsareoftenreferredtoasoutliers.Twographicaltechniquesforidentifyingoutliers,scatterplotsandboxplots,(…) EngineeringStatisticsHandbook Thesample,inourcase,istheBostonhousingdataset:itcontainssomemappingsbetweenfeaturevariablesandtargetprices,butobviouslydoesn'trepresentallhomesinBoston,whichwouldbethestatisticalpopulation.Nevertheless,wecanwritesomecodetogenerateaboxplotbasedonthisdataset: ''' GenerateaBoxPlotimagetodeterminehowmanyoutliersarewithintheBostonHousingPricingDataset. ''' importtensorflow.keras fromtensorflow.keras.datasetsimportboston_housing importnumpyasnp importmatplotlib.pyplotasplt #Loadthedata (x_train,y_train),(x_test,y_test)=boston_housing.load_data() #Weonlyneedthetargets,butdoneedtoconsiderallofthem y=np.concatenate((y_train,y_test)) #Generateboxplot plt.boxplot(y) plt.title('Bostonhousingpriceregressiondataset-boxplot') plt.show() Andnextrunit,tofindthisboxplot: Notethatweconcatenatedthetrainingdataandthetestingdataforthisboxplot.Althoughtheplothintstothefactthatmanyoutliersexist,andprimarilyatthehighendofthestatisticalspectrum(whichdoesmakesenseafterall,sinceinlifeextremelyhighhousepricesarequitecommonwhereasextremelylowonesarenot),wecannotyetconcludethattheMSEmaynotbeagoodidea.We'llneedtoinspecttheindividualdatasetstoo. Wecandothatbysimplyadaptingourcodeto: y=y_train or y=y_test Thisresultsinthefollowingboxplots: Althoughthenumberofoutliersismoreextremeinthetrainingdata,theyarepresentinthetestingdatasetaswell. Theirstructureisalsoquitesimilar:mostofthem,ifnotall,arepresentinthehighendsegmentofthehousingmarket. Donote,however,thatthemedianvalueforthetestingdatasetandthetrainingdatasetareslightlydifferent.Thismeansthatpatternsunderlyinghousingpricespresentinthetestingdatamaynotbecapturedfullyduringthetrainingprocess,becausethestatisticalsampleisslightlydifferent.However,thereisonlyonewaytofindout-byactuallycreatingaregressionmodel! Creatingthemodel Let'snowcreatethemodel.Createafilecalledhuber_loss.pyinsomefolderandopenthefileinadevelopmentenvironment.We'rethenreadytoaddsomecode!However,let'sanalyzefirstwhatyou'llneedtouseHuberlossinKeras. Whatyou'llneedtouseHuberlossinKeras Theprimarydependencythatyou'llneedisTensorFlow2,oneofthetwodeeplearninglibrariesforPython.InTensorFlow2,Kerasistightlycoupledastensorflow.kerasandcanthereforebeusedeasily.Infact,today,it'sthewaytocreateneuralnetworkswithTensorFloweasily. Modelimports Nowthatwecanstartcoding,let'simportthePythondependenciesthatweneedfirst: ''' KerasmodeldemonstratingHuberloss ''' fromtensorflow.keras.datasetsimportboston_housing fromtensorflow.keras.modelsimportSequential fromtensorflow.keras.layersimportDense fromtensorflow.keras.lossesimportHuber importnumpyasnp importmatplotlib.pyplotasplt Obviously,weneedtheboston_housingdatasetfromtheavailableKerasdatasets.Additionally,weimportSequentialaswewillbuildourmodelusingtheKerasSequentialAPI.We'recreatingaverysimplemodel,amultilayerperceptron,withwhichwe'llattempttoregressafunctionthatcorrectlyestimatesthemedianvaluesofBostonhomes.Forthisreason,weimportDenselayersordensely-connectedones. WealsoneedHubersincethat'sthelossfunctionweuse.NumpyisusedfornumberprocessingandweuseMatplotlibtovisualizetheendresult. Loadingthedataset WenextloadthedatabycallingtheKerasload_data()functiononthehousingdatasetandpreparetheinputlayershape,whichwecanaddtotheinitialhiddenlayerlater: #Loaddata (x_train,y_train),(x_test,y_test)=boston_housing.load_data() #Settheinputshape shape_dimension=len(x_train[0]) input_shape=(shape_dimension,) print(f'Featureshape:{input_shape}') Preparingthemodel:architecture&configuration Next,wedoactuallyprovidethemodelarchitectureandconfiguration: #Createthemodel model=Sequential() model.add(Dense(16,input_shape=input_shape,activation='relu',kernel_initializer='he_uniform')) model.add(Dense(8,activation='relu',kernel_initializer='he_uniform')) model.add(Dense(1,activation='linear')) #Configurethemodelandstarttraining model.compile(loss=Huber(delta=1.5),optimizer='adam',metrics=['mean_absolute_error']) history=model.fit(x_train,y_train,epochs=250,batch_size=1,verbose=1,validation_split=0.2) Asdiscussed,weusetheSequentialAPI;here,weusetwodensely-connectedhiddenlayersandoneoutputlayer.ThehiddenonesactivatebymeansofReLUandforthisreasonrequireHeuniforminitialization.Thefinallayeractivateslinearly,becauseitregressestheactualvalue. Compilingthemodelrequiresspecifyingthedeltavalue,whichwesetto1.5,givenourestimatethatwedon'twanttrueMAEbutthatgiventheoutliersidentifiedearlierfullMSEresemblenceisnotsmarteither.We'lloptimizebymeansofAdamandalsodefinetheMAEasanextraerrormetric.Thisway,wecanhaveanestimateaboutwhatthetrueerrorisintermsofthousandsofdollars:theMAEkeepsitsdomainunderstandingwhereasHuberlossdoesnot. Subsequently,wefitthetrainingdatatothemodel,complete250epochswithabatchsizeof1(trueSGD-likeoptimization,albeitwithAdam),use20%ofthedataasvalidationdataandensurethattheentiretrainingprocessisoutputtostandardoutput. Performancetesting&visualization Finally,weaddsomecodeforperformancetestingandvisualization: #Testthemodelaftertraining test_results=model.evaluate(x_test,y_test,verbose=1) print(f'Testresults-Loss:{test_results[0]}-MAE:{test_results[1]}') #Plothistory:HuberlossandMAE plt.plot(history.history['loss'],label='Huberloss(trainingdata)') plt.plot(history.history['val_loss'],label='Huberloss(validationdata)') plt.title('BostonHousingPriceDatasetregressionmodel-Huberloss') plt.ylabel('Lossvalue') plt.xlabel('No.epoch') plt.legend(loc="upperleft") plt.show() plt.title('BostonHousingPriceDatasetregressionmodel-MAE') plt.plot(history.history['mean_absolute_error'],label='MAE(trainingdata)') plt.plot(history.history['val_mean_absolute_error'],label='MAE(validationdata)') plt.ylabel('Lossvalue') plt.xlabel('No.epoch') plt.legend(loc="upperleft") plt.show() Modelperformancefor𝛿=1.5 Let'snowtakealookathowthemodelhasoptimizedovertheepochswiththeHuberloss: AndwiththeMAE: Wecanseethatoverall,themodelwasstillimprovingatthe250thepoch,althoughprogresswasstalling-whichisperfectlynormalinsuchatrainingprocess.Themeanabsoluteerrorwasapproximately$3.639. Testresults-Loss:4.502029736836751-MAE:3.6392388343811035 Recap Inthisblogpost,we'veseenhowtheHuberlosscanbeusedtobalancebetweenMAEandMSEinmachinelearningregressionproblems.Bymeansofthedeltaparameter,or𝛿,youcanconfigurewhichoneitshouldresemblemost,benefitingfromthefactthatyoucancheckthenumberofoutliersinyourdatasetapriori.Ihopeyou'veenjoyedthisblogandlearntsomethingfromit-pleaseletmeknowinthecommentsifyouhaveanyquestionsorremarks.Thanksandhappyengineering!😊 References Grover, P.(2019,September25).5RegressionLossFunctionsAllMachineLearnersShouldKnow.Retrievedfromhttps://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0 StatLib---DatasetsArchive.(n.d.).Retrievedfromhttp://lib.stat.cmu.edu/datasets/ Keras.(n.d.).Datasets.Retrievedfromhttps://keras.io/datasets/ Keras.(n.d.).Bostonhousingpriceregressiondataset.Retrievedfromhttps://keras.io/datasets/#boston-housing-price-regression-dataset CarnegieMellonUniversityStatLib.(n.d.).Bostonhouse-pricedata.Retrievedfromhttp://lib.stat.cmu.edu/datasets/boston EngineeringStatisticsHandbook.(n.d.).7.1.6.Whatareoutliersinthedata?Retrievedfromhttps://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm TensorFlow.(2021). Tf.keras.losses.Huber. https://www.tensorflow.org/api_docs/python/tf/keras/losses/Huber Go Youcan’tperformthatactionatthistime. Yousignedinwithanothertaborwindow.Reloadtorefreshyoursession. Yousignedoutinanothertaborwindow.Reloadtorefreshyoursession.



請為這篇文章評分?