Risk and Uncertainty in Deep Learning - Guilherme's Blog

文章推薦指數: 80 %
投票人數:10人

First, we implement the quantile (tilted) loss in Keras language and build loss functions for the 10th, 50th and 90th percentile:. bayesian regressionRiskandUncertaintyinDeepLearning11May2019 ·19minsreadNeuralnetworkshavebeenpushingwhatispossibleinalotofdomainsandarebecomingastandardtoolinindustry.Astheystartbeingavitalpartofbusinessdecisionmaking,methodsthattrytoopentheneuralnetwork“blackbox”arebecomingincreasinglypopular.LIME,SHAPandEmbeddingsarenicewaystoexplainwhatthemodellearnedandwhyitmakesthedecisionsitmakes.Ontheotherhand,insteadoftryingtoexplainwhatthemodellearned,wecanalsotrytogetinsightsaboutwhatthemodeldoesnotknow,whichimpliesestimatingtwodifferentquantities:riskanduncertainty.VariationalInference,MonteCarloDropoutandBootstrappedEnsemblesaresomeexamplesofresearchinthisarea.Atfirstglance,riskanduncertaintymayseemtobethesamething,butinrealitytheyare,insomecases,orthogonalconcepts.Riskstandsfortheintrinsicvolatilityovertheoutcomeofadecision:whenwerolladice,forinstance,wealwaysriskgettingabadoutcome,evenifwepreciselyknowthepossibleoutcomes.Uncertainty,ontheotherhand,standsfortheconfusionaboutwhatthepossibleoutcomesare:ifsomeonegivesusastrangedicewehaveneverusedbefore,we’llhavetorollitforawhilebeforewecanevenknowwhattoexpectaboutitsoutcomes.Riskisafixedpropertyofourproblem,whichcan’tbeclearedbycollectingmoredata,whileuncertaintyisapropertyofourbeliefs,andcanbeclearedwithmoredata.Actuallywecanhaveuncertaintyoverourbeliefofwhattheriskactuallyis!Ifthisseemsstrangeatfirst,don’tworry:thistopichasbeentheobjectofheateddiscussionsamongexpertsinthearearecently.Themainquestionisifagivenmodelestimatestherisk(aleatoricuncertainty)oruncertainty(epistemicuncertainty).Somereferencesmakethisdiscussionveryinteresting.Iputsomeofthemattheendofthepost,foryourconvenience.Inourcase,we’llfocusonasimpleexampletoillustratethehowtheconceptsaredifferentandhowtouseaneuralnetworktoestimatethematthesametime.ThefullcodeisavailableatthisKaggleKernel.1.DataWe’llusethesamedatageneratingprocessofmylastpost,borrowedfromBlundellet.al(2015).Iaddsomeheteroskedasticnoiseanduseagaussiandistributiontogenerate$X$,sothatriskanduncertaintyarebiggerwhenwegetfarfromtheorigin.Theprocesswilllooklikethis:\[y=x+0.3\cdot{}sin(2π(x+\epsilon))+0.3\cdot{}sin(4\cdot{}\pi\cdot{}(x+\epsilon))+\epsilon\]where$\epsilon\sim\mathcal{N}(0,0.01+0.1\cdotx^2)$and$x\simN(0.0,1.0)$:Thisproblemisgoodformeasuringbothriskanduncertainty.Riskgetsbiggerwheretheintrinsicnoisefromthedatageneratingprocessislarger,whichinthiscaseisawayfromtheorigin,duetoourchoiceof$\epsilon\sim\mathcal{N}(0,0.01+0.1\cdotx^2)$.Uncertaintygetsbiggerwherethere’slessdata,whichisalsoawayfromtheorigin,duetothedistributionof$x$beinganormal$x\simN(0.0,1.0)$.So,letusstarttobuildariskanduncertaintyestimatingmodelforthisdata!Thefirststepistouseavanillaneuralnetworktoestimateexpectedvalues.2.ExpectedvalueswithregularneuralnetworkLetusstartwiththesimplestmodel:avanillaneuralnetwork.Below,webuildtheget_regular_nnfunctiontotidyupthecompilationofthemodel.Warmingupforthenextchallengeofestimatingrisk,weusemean_absolute_errorasthelossfunction,whichwilltryestimatingthemedianexpectedvalueateach$x$.#functiontogetarandomizedpriorfunctionsmodel defget_regular_nn(): #sharedinputofthenetwork net_input=Input(shape=(1,),name='input') #trainablenetworkbody trainable_net=Sequential([Dense(16,'elu'), Dense(16,'elu'), Dense(16,'elu')], name='layers')(net_input) #trainablenetworkoutput trainable_output=Dense(1,activation='linear',name='output')(trainable_net) #definingthemodelandcompilingit model=Model(inputs=net_input,outputs=trainable_output) model.compile(loss='mean_absolute_error',optimizer='adam',metrics=['mean_squared_error']) #returningthemodel returnmodel Cool,themodelisimplementedthorughget_regular_nn.Wethenbuildamodelbycallingitandfittingittoourdata:#generatingthemodel regular_nn=get_regular_nn(); #fittingthemodel regular_nn.fit(X,y,batch_size=16,epochs=500,verbose=0) Weshowthefitbelow.It’sclosetowhatwewouldimaginearegularneuralnetworkfitwouldbeforthisdata:#letuscheckthetoydata plt.figure(figsize=[12,6],dpi=200) #firstplot plt.plot(X,y,'kx',label='Toydata',alpha=0.5,markersize=5) plt.plot(x_grid,regular_nn.predict(x_grid),label='neuralnetfit',color='tomato',alpha=0.8) plt.title('Neuralnetworkfitformedianexpectedvalue') plt.xlabel('$x$');plt.ylabel('$y$') plt.xlim(-3.5,3.5);plt.ylim(-5,3) plt.legend(); plt.show() Thefitisreasonable,butthere’salottoimproveyet.First,letusaddthecapacityofestimatingrisktothenetwork!3.RiskwithquantileregressionWeaddrisktoourmodelbymakingthenetworkperformquantileregression.Specifically,wewillimplementinget_quantile_reg_nnanetworktoestimatethemedian(50thpercentile),the10thand90thpercentile.Thequantileswillgiveusthesenseofvolatilitywewant,andwillbeourproxyofrisk.Itisnothardtodothat:wejusthavetochangetheobjectivefunctionfromL2loss(meansquarederror)toL1loss(meanabsoluteerror)forthemedian,andusethequantilelossforthe10thand90thpercentiles.IheavilyusedDeepQuantileRegressionbySachinAbeywardanaasinspiration,andIreallyrecommendtheread!First,weimplementthequantile(tilted)lossinKeraslanguageandbuildlossfunctionsforthe10th,50thand90thpercentile:#implementingthetilted(quantile)loss importtensorflow.keras.backendasK deftilted_loss(q,y,f): e=(y-f) returnK.mean(K.maximum(q*e,(q-1)*e),axis=-1) #lossesfor10th,50thand90thpercentile loss_10th_p=lambday,f:tilted_loss(0.10,y,f) loss_50th_p=lambday,f:tilted_loss(0.50,y,f) loss_90th_p=lambday,f:tilted_loss(0.90,y,f) Then,webuildthefunctionget_quantile_reg_nntogenerateourmodel.Themodelisamulti-headMLPwithoneoutput(andcorrespondinglossfunction)foreachpercentile.Wehaveasharednetworktrainable_net,whichconnectstothreeheadsoutput_**th_p,whichwilloutputthecorrespondingquantiles.#functiontogetarandomizedpriorfunctionsmodel defget_quantile_reg_nn(): #sharedinputofthenetwork net_input=Input(shape=(1,),name='input') #trainablenetworkbody trainable_net=Sequential([Dense(16,'elu'), Dense(16,'elu'), Dense(16,'elu')], name='shared')(net_input) #trainablenetworkoutput output_10th_p=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='output_10th_p')(trainable_net) output_50th_p=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='output_50th_p')(trainable_net) output_90th_p=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='output_90th_p')(trainable_net) #definingthemodelandcompilingit model=Model(inputs=net_input,outputs=[output_10th_p,output_50th_p,output_90th_p]) model.compile(loss=[loss_10th_p,loss_50th_p,loss_90th_p],optimizer='adam') #returningthemodel returnmodel Wecanseethemulti-outputarchitectureinthefollowingdiagram:#checkingfinalarchitecture fromIPython.displayimportSVG fromkeras.utils.vis_utilsimportmodel_to_dot SVG(model_to_dot(get_quantile_reg_nn()).create(prog='dot',format='svg')) Wethenproceedtofitthemodel.Notethatweneed(atleastIdon’tknowanyworkarounds)toduplicateourtargetandpassonecopyof$y$toeachoftheheadsofthenetwork,hence[y]*3inthe.fit()method.#generatingthemodel quantile_nn=get_quantile_reg_nn(); #fittingthemodel quantile_nn.fit(X,[y]*3,batch_size=16,epochs=500,verbose=0) Theresultmakesalotofsense.Thenetworklearnedtheshapeofourdata’sdistribution,effectivelyestimatingrisk.Thisisverybeneficialfordecisionmaking:wecanactuallyquantifyhowmuchweareputtingatstakewhenwechoosetoperformsomeactiongiventhenetwork’sprediction.Butthatleadstothenextquestion:howreliablearetheseestimatesofrisk?That’swhereuncertaintycomesintoplay,aswe’llseenext.4.UncertaintyandriskwithRandomizedPriorFunctionsNowweadduncertaintyestimation,bywrappingourquantileregressionmodelaroundtheRandomizedPriorFunctionsframework.RandomizedPriorFunctionsprovideasimpleandprincipledwaytoestimateuncertaintyinneuralnetworks.Inshort,we’regoingtobuildanbootstrappedensembleofnetworkscomposedbyanuntrainablepriornetwork$p$,andatrainablenetwork$f$,whicharecombinedtoformthefinaloutput$Q$,throughascalingfactor$\beta$:\(\largeQ=f+\beta\cdotp\)Theuncertaintyisgivenbythevarianceofpredictionsacrossensemblemembers.Bothbootstrappingandensemblingandtheuseofpriorscontributetobuildinguncertainty.Ifyouwantadeeperanalysis,pleasedocheckmyblogpostaboutthismodel.Cool.Soletusimplementthismodelinget_quantile_reg_rpf_nn,inthecodebelow:#functiontogetarandomizedpriorfunctionsmodel defget_quantile_reg_rpf_nn(): #sharedinputofthenetwork net_input=Input(shape=(1,),name='input') #trainablenetworkbody trainable_net=Sequential([Dense(16,'elu'), Dense(16,'elu'), Dense(16,'elu')], name='trainable_shared')(net_input) #trainablenetworkoutputs train_out_1=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='train_out_1')(trainable_net) train_out_2=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='train_out_2')(trainable_net) train_out_3=Sequential([Dense(8,activation='elu'), Dense(1,activation='linear')], name='train_out_3')(trainable_net) #priornetworkbody prior_net=Sequential([Dense(16,'elu',kernel_initializer='glorot_normal', trainable=False), Dense(16,'elu',kernel_initializer='glorot_normal', trainable=False), Dense(16,'elu',kernel_initializer='glorot_normal', trainable=False)], name='prior_shared')(net_input) #priornetworkoutputs prior_out_1=Dense(1,'elu',kernel_initializer='glorot_normal', trainable=False,name='prior_out_1')(prior_net) prior_out_2=Dense(1,'elu',kernel_initializer='glorot_normal', trainable=False,name='prior_out_2')(prior_net) prior_out_3=Dense(1,'elu',kernel_initializer='glorot_normal', trainable=False,name='prior_out_3')(prior_net) #usingalambdalayersowecancontroltheweight(beta)ofthepriornetwork prior_out_1=Lambda(lambdax:x*3.0,name='prior_scale_1')(prior_out_1) prior_out_2=Lambda(lambdax:x*3.0,name='prior_scale_2')(prior_out_2) prior_out_3=Lambda(lambdax:x*3.0,name='prior_scale_3')(prior_out_3) #addingalltheoutputstogether add_out_1=add([train_out_1,prior_out_1],name='add_out_1') add_out_2=add([train_out_2,prior_out_2],name='add_out_2') add_out_3=add([train_out_3,prior_out_3],name='add_out_3') #definingthemodelandcompilingit model=Model(inputs=net_input,outputs=[add_out_1,add_out_2,add_out_3]) model.compile(loss=[loss_10th_p,loss_50th_p,loss_90th_p],optimizer='adam') #returningthemodel returnmodel Seemslikethere’salotgoingonhere,butyoudon’tneedtoworry.Wehaveasharednet_inputforbothprior_netandtrainable_net,whicharejust3-layerMLPs.Inthetrainable_net,welettheweightsbetrainedbybackpropagation,whileintheprior_netwelockthembysettingthetrainableparametertoFalse.Bothnetshavethreeoutputheads,train_out_*andprior_out_*,eachforeachquantile,stillkeepingthepriorlockedtotraining.Furthermore,weapplyaLambdalayertoprior_out_*,whichmultipliesitsoutputby3.Thisisourimplementationof$\beta$fromtheoriginalformula!Finally,theoutputsofpriorandtrainableareaddedtogetherviaanaddlayertocompletethemodel.Takealookatthearchitecturebelowtoseeifitmakessensetoyou.Inshort,ourmodeliscomposedbytwoparallelmulti-outputnetworks,whereoneofthemisallowedtotrainandtheotherisnot.#checkingfinalarchitecture fromIPython.displayimportSVG fromkeras.utils.vis_utilsimportmodel_to_dot SVG(model_to_dot(get_quantile_reg_rpf_nn()).create(prog='dot',format='svg')) Cool!ThelastadaptationwehavetodoisimprovetheKerasRegressorclasstoworkwithmulti-outputmodels.That’sbecausesklearnwon’tacceptthe[y]*3syntaxIusedbefore.classMyMultiOutputKerasRegressor(KerasRegressor): #initializing def__init__(self,**kwargs): KerasRegressor.__init__(self,**kwargs) #simplerfitmethod deffit(self,X,y,**kwargs): KerasRegressor.fit(self,X,[y]*3,**kwargs) Nowwe’rereadytofitthemodel!Wejusttakeourget_quantile_reg_rpf_nn,wrapitaroundtheMyMultiOutputKerasRegressorandbuildabootstrappedensembleusingBaggingRegressor!#wrappingourbasemodelaroundasklearnestimator base_model=MyMultiOutputKerasRegressor(build_fn=get_quantile_reg_rpf_nn, epochs=500,batch_size=16,verbose=0) #createabaggedensembleof10basemodels bag=BaggingRegressor(base_estimator=base_model,n_estimators=10,verbose=2) #fittingtheensemble bag.fit(X,y) #outputoftheneuralnet quantile_output=np.array([np.array(e.predict(x_grid)).reshape(3,1000)foreinbag.estimators_]) Theresultsarereallycool.Below,I’mplottingthemedian,10thand90thpercentileofeachensemblemember.Ifyoulookatthecurves,you’llseethatensemblemembersagreealotaroundtheorigin,butstarttodisagreemorefurtheraway.This“disagreement”isouruncertainty!Thiseffectivelymeasuresthevarianceofourestimates,givingusadistributionoverfunctionsforthemedian,10thand90thpercentiles.Foramorefamiliarview,wecanalsoplotthe80%confidenceintervalforourdistributionsoffunctions.Hereweseetheuncertaintymoreclearly,andhowitgetsbiggeraswemoveawayfromthedata.5.ConclusionInthispost,weworkedoutthedifferencebetweentwoessentialmeasuresfordecision-making:riskanduncertainty.Wesawthatriskcanbeseenastheintrinsicvarianceofourdata,andcanbemodelledbyaquantileregression.Uncertainty,intheotherhand,isthevarianceofourestimateandcanbemodelledbyabayesiandeeplearningalgorithmsuchasRandomizedPriorFunctions.Joiningbothworlds,wecouldcreateamodelthatmodelsriskanduncertaintyatthesametime,beingveryusefulfordecision-making.Ihopeyoulikedthepost!Seeyousoon!Riskvs.UncertaintyDiscussionIfyouwanttoreadmoreaboutriskanduncertainty,lookatthereferencesbelow:DropoutasaBayesianApproximation:RepresentingModelUncertaintyinDeepLearning:argumentsthatdropout(attesttime)inNNshasaconnectiontogaussianprocessesandmotivatesitsusageasabayesianmethodRiskversusUncertaintyinDeepLearning:Bayes,BootstrapandtheDangersofDropout:motivatesthatdropoutwithfixed$p$estimatesriskandnotuncertaintyRandomizedPriorFuncionsforDeepReinforcementLearning:showstheshortcomingswithothertechniquesandmotivatestheuseofbootstrapandpriorfunctionsPreviousAPracticalIntroductionto...NextCalculatingcounterfactuals...



請為這篇文章評分?