斯坦福大学机器学习所有问题及答案合集下载_在线阅读_97

首页 > 斯坦福大学机器学习所有问题及答案合集

艳

暂无简介

斯坦福大学机器学习所有问题及答案合集CS229机器学习(问题及答案)斯坦福大学目录(1)作业1（SupervisedLearning）1(2)作业1解答（SupervisedLearning）5(3)作业2（Kernels,SVMs,andTheory）15(4)作业2解答（Kernels,SVMs,andTheory）19(5)作业3（LearningTheoryandUnsupervisedLearning）27(6)作业3解答（LearningTheoryandUnsupervisedLearning）31(7)作业4（UnsupervisedLearni...

CS229机器学习(问题及

答案

八年级地理上册填图题岩土工程勘察试题省略号的作用及举例应急救援安全知识车间5s试题及答案

)斯坦福大学目录(1)作业1（SupervisedLearning）1(2)作业1解答（SupervisedLearning）5(3)作业2（Kernels,SVMs,andTheory）15(4)作业2解答（Kernels,SVMs,andTheory）19(5)作业3（LearningTheoryandUnsupervisedLearning）27(6)作业3解答（LearningTheoryandUnsupervisedLearning）31(7)作业4（UnsupervisedLearningandReinforcementLearning）39(8)作业4解答（UnsupervisedLearningandReinforcementLearning）44(9)ProblemSet#1:SupervisedLearning56(10)ProblemSet#1Answer62(11)ProblemSet#2:ProblemSet#2:NaiveBayes,SVMs,andTheory78(12)ProblemSet#2Answer85CS229ProblemSet#11CS229,PublicCourseProblemSet#1:SupervisedLearning1.Newton’smethodforcomputingleastsquaresInthisproblem,wewillprovethatifweuseNewton’smethodsolvetheleastsquaresoptimizationproblem,thenweonlyneedoneiterationtoconvergetoθ∗.(a)FindtheHessianofthecostfunctionJ(θ)=12∑mi=1(θTx(i)−y(i))2.(b)ShowthatthefirstiterationofNewton’smethodgivesusθ⋆=(XTX)−1XT~y,thesolutiontoourleastsquaresproblem.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdifferenttrainingexamplesdifferentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximizeℓ(θ)=−λ2θTθ+m∑i=1w(i)[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))].The−λ2θTθhereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewton’smethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalueλ=0.0001.Usingthisdefinition,thegradientofℓ(θ)isgivenby∇θℓ(θ)=XTz−λθwherez∈Rmisdefinedbyzi=w(i)(y(i)−hθ(x(i)))andtheHessianisgivenbyH=XTDX−λIwhereD∈Rm×misadiagonalmatrixwithDii=−w(i)hθ(x(i))(1−hθ(x(i)))Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw(i)=exp(−||x−x(i)||22τ2).Muchlikethelocallyweightedlinearregressionthatwasdiscussedinclass,thisweightingschemegivesmorewhenthe“nearby”pointswhenpredictingtheclassofanewexample.1CS229ProblemSet#12(a)ImplementtheNewton-Raphsonalgorithmforoptimizingℓ(θ)foranewquerypointx,andusethistopredicttheclassofx.Theq2/directorycontainsdataandcodeforthisproblem.Youshouldimplementthey=lwlr(Xtrain,ytrain,x,tau)functioninthelwlr.mfile.Thisfunc-tiontakesasinputthetrainingset(theXtrainandytrainmatrices,intheformdescribedintheclassnotes),anewquerypointxandtheweightbandwitdhtau.Giventhisinputthefunctionshould1)computeweightsw(i)foreachtrainingexam-ple,usingtheformulaabove,2)maximizeℓ(θ)usingNewton’smethod,andfinally3)outputy=1{hθ(x)>0.5}astheprediction.Weprovidetwoadditionalfunctionsthatmighthelp.The[Xtrain,ytrain]=loaddata;functionwillloadthematricesfromfilesinthedata/folder.Thefunc-tionplotlwlr(Xtrain,ytrain,tau,resolution)willplottheresultingclas-sifier(assumingyouhaveproperlyimplementedlwlr.m).Thisfunctionevaluatesthelocallyweightedlogisticregressionclassifieroveralargegridofpointsandplotstheresultingpredictionasblue(predictingy=0)orred(predictingy=1).Dependingonhowfastyourlwlrfunctionis,creatingtheplotmighttakesometime,sowerecommenddebuggingyourcodewithresolution=50;andlaterincreaseittoatleast200togetabetterideaofthedecisionboundary.(b)Evaluatethesystemwithavarietyofdifferentbandwidthparametersτ.Inparticular,tryτ=0.01,0.050.1,0.51.0,5.0.Howdoestheclassificationboundarychangewhenvaryingthisparameter?Canyoupredictwhatthedecisionboundaryofordinary(unweighted)logisticregressionwouldlooklike?3.MultivariateleastsquaresSofarinclass,wehaveonlyconsideredcaseswhereourtargetvariableyisascalarvalue.Supposethatinsteadoftryingtopredictasingleoutput,wehaveatrainingsetwithmultipleoutputsforeachexample:{(x(i),y(i)),i=1,...,m},x(i)∈Rn,y(i)∈Rp.Thusforeachtrainingexample,y(i)isvector-valued,withpentries.Wewishtousealinearmodeltopredicttheoutputs,asinleastsquares,byspecifyingtheparametermatrixΘiny=ΘTx,whereΘ∈Rn×p.(a)ThecostfunctionforthiscaseisJ(Θ)=12m∑i=1p∑j=1((ΘTx(i))j−y(i)j)2.WriteJ(Θ)inmatrix-vectornotation(i.e.,withoutusinganysummations).[Hint:Startwiththem×ndesignmatrixX=—(x(1))T——(x(2))T—...—(x(m))T—2CS229ProblemSet#13andthem×ptargetmatrixY=—(y(1))T——(y(2))T—...—(y(m))T—andthenworkouthowtoexpressJ(Θ)intermsofthesematrices.](b)FindtheclosedformsolutionforΘwhichminimizesJ(Θ).Thisistheequivalenttothenormalequationsforthemultivariatecase.(c)Supposeinsteadofconsideringthemultivariatevectorsy(i)allatonce,weinsteadcomputeeachvariabley(i)jseparatelyforeachj=1,...,p.Inthiscase,wehaveapindividuallinearmodels,oftheformy(i)j=θTjx(i),j=1,...,p.(Sohere,eachθj∈Rn).Howdotheparametersfromthesepindependentleastsquaresproblemscomparetothemultivariatesolution?4.NaiveBayesInthisproblem,welookatmaximumlikelihoodparameterestimationusingthenaiveBayesassumption.Here,theinputfeaturesxj,j=1,...,ntoourmodelarediscrete,binary-valuedvariables,soxj∈{0,1}.Wecallx=[x1x2···xn]Ttobetheinputvector.Foreachtrainingexample,ouroutputtargetsareasinglebinary-valuey∈{0,1}.Ourmodelisthenparameterizedbyφj|y=0=p(xj=1|y=0),φj|y=1=p(xj=1|y=1),andφy=p(y=1).Wemodelthejointdistributionof(x,y)accordingtop(y)=(φy)y(1−φy)1−yp(x|y=0)=n∏j=1p(xj|y=0)=n∏j=1(φj|y=0)xj(1−φj|y=0)1−xjp(x|y=1)=n∏j=1p(xj|y=1)=n∏j=1(φj|y=1)xj(1−φj|y=1)1−xj(a)Findthejointlikelihoodfunctionℓ(ϕ)=log∏mi=1p(x(i),y(i);ϕ)intermsofthemodelparametersgivenabove.Here,ϕrepresentstheentiresetofparameters{φy,φj|y=0,φj|y=1,j=1,...,n}.(b)Showthattheparameterswhichmaximizethelikelihoodfunctionarethesameas3CS229ProblemSet#14thosegiveninthelecturenotes;i.e.,thatφj|y=0=∑mi=11{x(i)j=1∧y(i)=0}∑mi=11{y(i)=0}φj|y=1=∑mi=11{x(i)j=1∧y(i)=1}∑mi=11{y(i)=1}φy=∑mi=11{y(i)=1}m.(c)ConsidermakingapredictiononsomenewdatapointxusingthemostlikelyclassestimategeneratedbythenaiveBayesalgorithm.ShowthatthehypothesisreturnedbynaiveBayesisalinearclassifier—i.e.,ifp(y=0|x)andp(y=1|x)aretheclassprobabilitiesreturnedbynaiveBayes,showthatthereexistssomeθ∈Rn+1suchthatp(y=1|x)≥p(y=0|x)ifandonlyifθT[1x]≥0.(Assumeθ0isaninterceptterm.)5.Exponentialfamilyandthegeometricdistribution(a)Considerthegeometricdistributionparameterizedbyφ:p(y;φ)=(1−φ)y−1φ,y=1,2,3,....Showthatthegeometricdistributionisintheexponentialfamily,andgiveb(y),η,T(y),anda(η).(b)ConsiderperformingregressionusingaGLMmodelwithageometricresponsevari-able.Whatisthecanonicalresponsefunctionforthefamily?Youmayusethefactthatthemeanofageometricdistributionisgivenby1/φ.(c)Foratrainingset{(x(i),y(i));i=1,...,m},letthelog-likelihoodofanexamplebelogp(y(i)|x(i);θ).Bytakingthederivativeofthelog-likelihoodwithrespecttoθj,derivethestochasticgradientascentruleforlearningusingaGLMmodelwithgoemetricresponsesyandthecanonicalresponsefunction.4CS229ProblemSet#1Solutions1CS229,PublicCourseProblemSet#1Solutions:SupervisedLearning1.Newton’smethodforcomputingleastsquaresInthisproblem,wewillprovethatifweuseNewton’smethodsolvetheleastsquaresoptimizationproblem,thenweonlyneedoneiterationtoconvergetoθ∗.(a)FindtheHessianofthecostfunctionJ(θ)=12∑mi=1(θTx(i)−y(i))2.Answer:Asshownintheclassnotes∂J(θ)∂θj=m∑i=1(θTx(i)−y(i))x(i)j.So∂2J(θ)∂θj∂θk=m∑i=1∂∂θk(θTx(i)−y(i))x(i)j=m∑i=1x(i)jx(i)k=(XTX)jkTherefore,theHessianofJ(θ)isH=XTX.ThiscanalsobederivedbysimplyapplyingrulesfromthelecturenotesonLinearAlgebra.(b)ShowthatthefirstiterationofNewton’smethodgivesusθ⋆=(XTX)−1XT~y,thesolutiontoourleastsquaresproblem.Answer:Givenanyθ(0),Newton’smethodfindsθ(1)accordingtoθ(1)=θ(0)−H−1∇θJ(θ(0))=θ(0)−(XTX)−1(XTXθ(0)−XT~y)=θ(0)−θ(0)+(XTX)−1XT~y=(XTX)−1XT~y.Therefore,nomatterwhatθ(0)wepick,Newton’smethodalwaysfindsθ⋆afteroneiteration.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdifferenttrainingexamplesdifferentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximizeℓ(θ)=−λ2θTθ+m∑i=1w(i)[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))].5CS229ProblemSet#1Solutions2The−λ2θTθhereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewton’smethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalueλ=0.0001.Usingthisdefinition,thegradientofℓ(θ)isgivenby∇θℓ(θ)=XTz−λθwherez∈Rmisdefinedbyzi=w(i)(y(i)−hθ(x(i)))andtheHessianisgivenbyH=XTDX−λIwhereD∈Rm×misadiagonalmatrixwithDii=−w(i)hθ(x(i))(1−hθ(x(i)))Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw(i)=exp(−||x−x(i)||22τ2).Muchlikethelocallyweightedlinearregressionthatwasdiscussedinclass,thisweightingschemegivesmorewhenthe“nearby”pointswhenpredictingtheclassofanewexample.(a)ImplementtheNewton-Raphsonalgorithmforoptimizingℓ(θ)foranewquerypointx,andusethistopredicttheclassofx.Theq2/directorycontainsdataandcodeforthisproblem.Youshouldimplementthey=lwlr(Xtrain,ytrain,x,tau)functioninthelwlr.mfile.Thisfunc-tiontakesasinputthetrainingset(theXtrainandytrainmatrices,intheformdescribedintheclassnotes),anewquerypointxandtheweightbandwitdhtau.Giventhisinputthefunctionshould1)computeweightsw(i)foreachtrainingexam-ple,usingtheformulaabove,2)maximizeℓ(θ)usingNewton’smethod,andfinally3)outputy=1{hθ(x)>0.5}astheprediction.Weprovidetwoadditionalfunctionsthatmighthelp.The[Xtrain,ytrain]=loaddata;functionwillloadthematricesfromfilesinthedata/folder.Thefunc-tionplotlwlr(Xtrain,ytrain,tau,resolution)willplottheresultingclas-sifier(assumingyouhaveproperlyimplementedlwlr.m).Thisfunctionevaluatesthelocallyweightedlogisticregressionclassifieroveralargegridofpointsandplotstheresultingpredictionasblue(predictingy=0)orred(predictingy=1).Dependingonhowfastyourlwlrfunctionis,creatingtheplotmighttakesometime,sowerecommenddebuggingyourcodewithresolution=50;andlaterincreaseittoatleast200togetabetterideaofthedecisionboundary.Answer:Ourimplementationoflwlr.m:functiony=lwlr(X_train,y_train,x,tau)m=size(X_train,1);n=size(X_train,2);6CS229ProblemSet#1Solutions3theta=zeros(n,1);%computeweightsw=exp(-sum((X_train-repmat(x’,m,1)).^2,2)/(2*tau));%performNewton’smethodg=ones(n,1);while(norm(g)>1e-6)h=1./(1+exp(-X_train*theta));g=X_train’*(w.*(y_train-h))-1e-4*theta;H=-X_train’*diag(w.*h.*(1-h))*X_train-1e-4*eye(n);theta=theta-H\g;end%returnpredictedyy=double(x’*theta>0);(b)Evaluatethesystemwithavarietyofdifferentbandwidthparametersτ.Inparticular,tryτ=0.01,0.050.1,0.51.0,5.0.Howdoestheclassificationboundarychangewhenvaryingthisparameter?Canyoupredictwhatthedecisionboundaryofordinary(unweighted)logisticregressionwouldlooklike?Answer:Thesearetheresultingdecisionboundaries,forthedifferentvaluesofτ.tau=0.01tau=0.05tau=0.1tau=0.5tau=0.5tau=5Forsmallerτ,theclassifierappearstooverfitthedataset,obtainingzerotrainingerror,butoutputtingasporadiclookingdecisionboundary.Asτgrows,theresultingdeci-sionboundarybecomessmoother,eventuallyconverging(inthelimitasτ→∞totheunweightedlinearregressionsolution).3.MultivariateleastsquaresSofarinclass,wehaveonlyconsideredcaseswhereourtargetvariableyisascalarvalue.Supposethatinsteadoftryingtopredictasingleoutput,wehaveatrainingsetwith7CS229ProblemSet#1Solutions4multipleoutputsforeachexample:{(x(i),y(i)),i=1,...,m},x(i)∈Rn,y(i)∈Rp.Thusforeachtrainingexample,y(i)isvector-valued,withpentries.Wewishtousealinearmodeltopredicttheoutputs,asinleastsquares,byspecifyingtheparametermatrixΘiny=ΘTx,whereΘ∈Rn×p.(a)ThecostfunctionforthiscaseisJ(Θ)=12m∑i=1p∑j=1((ΘTx(i))j−y(i)j)2.WriteJ(Θ)inmatrix-vectornotation(i.e.,withoutusinganysummations).[Hint:Startwiththem×ndesignmatrixX=—(x(1))T——(x(2))T—...—(x(m))T—andthem×ptargetmatrixY=—(y(1))T——(y(2))T—...—(y(m))T—andthenworkouthowtoexpressJ(Θ)intermsofthesematrices.]Answer:TheobjectivefunctioncanbeexpressedasJ(Θ)=12tr((XΘ−Y)T(XΘ−Y)).Toseethis,notethatJ(Θ)=12tr((XΘ−Y)T(XΘ−Y))=12∑i(XΘ−Y)T(XΘ−Y))ii=12∑i∑j(XΘ−Y)2ij=12m∑i=1p∑j=1((ΘTx(i))j−y(i)j)28CS229ProblemSet#1Solutions5(b)FindtheclosedformsolutionforΘwhichminimizesJ(Θ).Thisistheequivalenttothenormalequationsforthemultivariatecase.Answer:FirstwetakethegradientofJ(Θ)withrespecttoΘ.∇ΘJ(Θ)=∇Θ[12tr((XΘ−Y)T(XΘ−Y))]=∇Θ[12tr(ΘTXTXΘ−ΘTXTY−YTXΘ−YTT)]=12∇Θ[tr(ΘTXTXΘ)−tr(ΘTXTY)−tr(YTXΘ)+tr(YTY)]=12∇Θ[tr(ΘTXTXΘ)−2tr(YTXΘ)+tr(YTY)]=12[XTXΘ+XTXΘ−2XTY]=XTXΘ−XTYSettingthisexpressiontozeroweobtainΘ=(XTX)−1XTY.Thislooksverysimilartotheclosedformsolutionintheunivariatecase,exceptnowYisam×pmatrix,sothenΘisalsoamatrix,ofsizen×p.(c)Supposeinsteadofconsideringthemultivariatevectorsy(i)allatonce,weinsteadcomputeeachvariabley(i)jseparatelyforeachj=1,...,p.Inthiscase,wehaveapindividuallinearmodels,oftheformy(i)j=θTjx(i),j=1,...,p.(Sohere,eachθj∈Rn).Howdotheparametersfromthesepindependentleastsquaresproblemscomparetothemultivariatesolution?Answer:Thistime,weconstructasetofvectors~yj=y(1)jy(2)j...y(m)j,j=1,...,p.Thenourj-thlinearmodelcanbesolvedbytheleastsquaressolutionθj=(XTX)−1XT~yj.Ifwelineupourθj,weseethatwehavethefollowingequation:[θ1θ2···θp]=[(XTX)−1XT~y1(XTX)−1XT~y2···(XTX)−1XT~yp]=(XTX)−1XT[~y1~y2···~yp]=(XTX)−1XTY=Θ.Thus,ourpindividualleastsquaresproblemsgivetheexactsamesolutionasthemulti-variateleastsquares.9CS229ProblemSet#1Solutions64.NaiveBayesInthisproblem,welookatmaximumlikelihoodparameterestimationusingthenaiveBayesassumption.Here,theinputfeaturesxj,j=1,...,ntoourmodelarediscrete,binary-valuedvariables,soxj∈{0,1}.Wecallx=[x1x2···xn]Ttobetheinputvector.Foreachtrainingexample,ouroutputtargetsareasinglebinary-valuey∈{0,1}.Ourmodelisthenparameterizedbyφj|y=0=p(xj=1|y=0),φj|y=1=p(xj=1|y=1),andφy=p(y=1).Wemodelthejointdistributionof(x,y)accordingtop(y)=(φy)y(1−φy)1−yp(x|y=0)=n∏j=1p(xj|y=0)=n∏j=1(φj|y=0)xj(1−φj|y=0)1−xjp(x|y=1)=n∏j=1p(xj|y=1)=n∏j=1(φj|y=1)xj(1−φj|y=1)1−xj(a)Findthejointlikelihoodfunctionℓ(ϕ)=log∏mi=1p(x(i),y(i);ϕ)intermsofthemodelparametersgivenabove.Here,ϕrepresentstheentiresetofparameters{φy,φj|y=0,φj|y=1,j=1,...,n}.Answer:ℓ(ϕ)=logm∏i=1p(x(i),y(i);ϕ)=logm∏i=1p(x(i)|y(i);ϕ)p(y(i);ϕ)=logm∏i=1n∏j=1p(x(i)j|y(i);ϕ)p(y(i);ϕ)=m∑i=1logp(y(i);ϕ)+n∑j=1logp(x(i)j|y(i);ϕ)=m∑i=1[y(i)logφy+(1−y(i))log(1−φy)+n∑j=1(x(i)jlogφj|y(i)+(1−x(i)j)log(1−φj|y(i)))(b)Showthattheparameterswhichmaximizethelikelihoodfunctionarethesameas10CS229ProblemSet#1Solutions7thosegiveninthelecturenotes;i.e.,thatφj|y=0=∑mi=11{x(i)j=1∧y(i)=0}∑mi=11{y(i)=0}φj|y=1=∑mi=11{x(i)j=1∧y(i)=1}∑mi=11{y(i)=1}φy=∑mi=11{y(i)=1}m.Answer:Theonlytermsinℓ(ϕ)whichhavenon-zerogradientwithrespecttoφj|y=0arethosewhichincludeφj|y(i).Therefore,∇φj|y=0ℓ(ϕ)=∇φj|y=0m∑i=1(x(i)jlogφj|y(i)+(1−x(i)j)log(1−φj|y(i)))=∇φj|y=0m∑i=1(x(i)jlog(φj|y=0)1{y(i)=0}+(1−x(i)j)log(1−φj|y=0)1{y(i)=0})=m∑i=1(x(i)j1φj|y=01{y(i)=0}−(1−x(i)j)11−φj|y=01{y(i)=0}).Setting∇φj|y=0ℓ(ϕ)=0gives0=m∑i=1(x(i)j1φj|y=01{y(i)=0}−(1−x(i)j)11−φj|y=01{y(i)=0})=m∑i=1(x(i)j(1−φj|y=0)1{y(i)=0}−(1−x(i)j)φj|y=01{y(i)=0})=m∑i=1((x(i)j−φj|y=0)1{y(i)=0})=m∑i=1(x(i)j·1{y(i)=0})−φj|y=0m∑i=11{y(i)=0}=m∑i=1(1{x(i)j=1∧y(i)=0})−φj|y=0m∑i=11{y(i)=0}.Wethenarriveatourdesiredresultφj|y=0=∑mi=11{x(i)j=1∧y(i)=0}∑mi=11{y(i)=0}Thesolutionforφj|y=1proceedsintheidenticalmanner.11CS229ProblemSet#1Solutions8Tosolveforφy,∇φyℓ(ϕ)=∇φym∑i=1(y(i)logφy+(1−y(i))log(1−φy))=m∑i=1(y(i)1φy−(1−y(i))11−φy)Thensetting∇φy=0givesus0=m∑i=1(y(i)1φy−(1−y(i))11−φy)=m∑i=1(y(i)(1−φy)−(1−y(i))φy)=m∑i=1y(i)−m∑i=1φy.Therefore,φy=∑mi=11{y(i)=1}m.(c)ConsidermakingapredictiononsomenewdatapointxusingthemostlikelyclassestimategeneratedbythenaiveBayesalgorithm.ShowthatthehypothesisreturnedbynaiveBayesisalinearclassifier—i.e.,ifp(y=0|x)andp(y=1|x)aretheclassprobabilitiesreturnedbynaiveBayes,showthatthereexistssomeθ∈Rn+1suchthatp(y=1|x)≥p(y=0|x)ifandonlyifθT[1x]≥0.(Assumeθ0isaninterceptterm.)Answer:p(y=1|x)≥p(y=0|x)⇐⇒p(y=1|x)p(y=0|x)≥1⇐⇒(∏nj=1p(xj|y=1))p(y=1)(∏nj=1p(xj|y=0))p(y=0)≥1⇐⇒(∏nj=1(φj|y=0)xj(1−φj|y=0)1−xj)φy(∏nj=1(φj|y=1)xj(1−φj|y=1)1−xj)(1−φy)≥1⇐⇒n∑j=1(xjlog(φj|y=1φj|y=0)+(1−xj)log(1−φj|y=01−φj|y=0))+log(φy1−φy)≥0⇐⇒n∑j=1xjlog((φj|y=1)(1−φj|y=0)(φj|y=0)(1−φj|y=1))+n∑j=1log(1−φj|y=11−φj|y=0)+log(φy1−φy)≥0⇐⇒θT[1x]≥0,12CS229ProblemSet#1Solutions9whereθ0=n∑j=1log(1−φj|y=11−φj|y=0)+log(φy1−φy)θj=log((φj|y=1)(1−φj|y=0)(φj|y=0)(1−φj|y=1)),j=1,...,n.5.Exponentialfamilyandthegeometricdistribution(a)Considerthegeometricdistributionparameterizedbyφ:p(y;φ)=(1−φ)y−1φ,y=1,2,3,....Showthatthegeometricdistributionisintheexponentialfamily,andgiveb(y),η,T(y),anda(η).Answer:p(y;φ)=(1−φ)y−1φ=exp[log(1−φ)y−1+logφ]=exp[(y−1)log(1−φ)+logφ]=exp[ylog(1−φ)−log(1−φφ)]Thenb(y)=1η=log(1−φ)T(y)=ya(η)=log(1−φφ)=log(eη1−eη),wherethelastlinefollowsbecuaseη=log(1−φ)⇒eη=1−φ⇒φ=1−eη.(b)ConsiderperformingregressionusingaGLMmodelwithageometricresponsevari-able.Whatisthecanonicalresponsefunctionforthefamily?Youmayusethefactthatthemeanofageometricdistributionisgivenby1/φ.Answer:g(η)=E[y;φ]=1φ=11−eη.(c)Foratrainingset{(x(i),y(i));i=1,...,m},letthelog-likelihoodofanexamp

本文档为【斯坦福大学机器学习所有问题及答案合集】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

斯坦福大学机器学习所有问题及答案合集

热门搜索

历史搜索