YOLO9000_paper computer vision下载_在线阅读_9

is_566857

暂无简介

YOLO9000_paper computer visionYOLO9000:Better,Faster,StrongerJosephRedmon∗†,AliFarhadi∗†UniversityofWashington∗,AllenInstituteforAI†http://pjreddie.com/yolo9000/AbstractWeintroduceYOLO9000,astate-of-the-art,real-timeobjectdetectionsystemthatcandetectover...

YOLO9000:Better,Faster,StrongerJosephRedmon∗†,AliFarhadi∗†UniversityofWashington∗,AllenInstituteforAI†http://pjreddie.com/yolo9000/AbstractWeintroduceYOLO9000,astate-of-the-art,real-timeobjectdetectionsystemthatcandetectover9000objectcategories.FirstweproposevariousimprovementstotheYOLOdetectionmethod,bothnovelanddrawnfrompriorwork.Theimprovedmodel,YOLOv2,isstate-of-the-artonstandarddetectiontaskslikePASCALVOCandCOCO.Us-inganovel,multi-scaletrainingmethodthesameYOLOv2modelcanrunatvaryingsizes,offeringaneasytradeoffbetweenspeedandaccuracy.At67FPS,YOLOv2gets76.8mAPonVOC2007.At40FPS,YOLOv2gets78.6mAP,outperformingstate-of-the-artmethodslikeFasterR-CNNwithResNetandSSDwhilestillrunningsignificantlyfaster.Finallyweproposeamethodtojointlytrainonob-jectdetectionandclassification.UsingthismethodwetrainYOLO9000simultaneouslyontheCOCOdetectiondatasetandtheImageNetclassificationdataset.OurjointtrainingallowsYOLO9000topredictdetectionsforobjectclassesthatdon’thavelabelleddetectiondata.WevalidateourapproachontheImageNetdetectiontask.YOLO9000gets19.7mAPontheImageNetdetectionvalidationsetdespiteonlyhavingdetectiondatafor44ofthe200classes.Onthe156classesnotinCOCO,YOLO9000gets16.0mAP.ButYOLOcandetectmorethanjust200classes;itpredictsde-tectionsformorethan9000differentobjectcategories.Anditstillrunsinreal-time.1.IntroductionGeneralpurposeobjectdetectionshouldbefast,accu-rate,andabletorecognizeawidevarietyofobjects.Sincetheintroductionofneuralnetworks,detectionframeworkshavebecomeincreasinglyfastandaccurate.However,mostdetectionmethodsarestillconstrainedtoasmallsetofob-jects.Currentobjectdetectiondatasetsarelimitedcomparedtodatasetsforothertaskslikeclassificationandtagging.Themostcommondetectiondatasetscontainthousandstohundredsofthousandsofimageswithdozenstohundredsoftags[3][10][2].Classificationdatasetshavemillionsofimageswithtensorhundredsofthousandsofcategories[20][2].Wewouldlikedetectiontoscaletolevelofobjectclas-sification.However,labellingimagesfordetectionisfarmoreexpensivethanlabellingforclassificationortagging(tagsareoftenuser-suppliedforfree).ThusweareunlikelyFigure1:YOLO9000.YOLO9000candetectawidevarietyofobjectclassesinreal-time.1toseedetectiondatasetsonthesamescaleasclassificationdatasetsinthenearfuture.Weproposeanewmethodtoharnessthelargeamountofclassificationdatawealreadyhaveanduseittoexpandthescopeofcurrentdetectionsystems.Ourmethodusesahierarchicalviewofobjectclassificationthatallowsustocombinedistinctdatasetstogether.Wealsoproposeajointtrainingalgorithmthatallowsustotrainobjectdetectorsonbothdetectionandclassifica-tiondata.Ourmethodleverageslabeleddetectionimagestolearntopreciselylocalizeobjectswhileitusesclassificationimagestoincreaseitsvocabularyandrobustness.UsingthismethodwetrainYOLO9000,areal-timeob-jectdetectorthatcandetectover9000differentobjectcat-egories.FirstweimproveuponthebaseYOLOdetectionsystemtoproduceYOLOv2,astate-of-the-art,real-timedetector.Thenweuseourdatasetcombinationmethodandjointtrainingalgorithmtotrainamodelonmorethan9000classesfromImageNetaswellasdetectiondatafromCOCO.Allofourcodeandpre-trainedmodelsareavailableon-lineathttp://pjreddie.com/yolo9000/.2.BetterYOLOsuffersfromavarietyofshortcomingsrelativetostate-of-the-artdetectionsystems.ErroranalysisofYOLOcomparedtoFastR-CNNshowsthatYOLOmakesasig-nificantnumberoflocalizationerrors.Furthermore,YOLOhasrelativelylowrecallcomparedtoregionproposal-basedmethods.Thuswefocusmainlyonimprovingrecallandlocalizationwhilemaintainingclassificationaccuracy.Computervisiongenerallytrendstowardslarger,deepernetworks[6][18][17].Betterperformanceoftenhingesontraininglargernetworksorensemblingmultiplemodelsto-gether.However,withYOLOv2wewantamoreaccuratedetectorthatisstillfast.Insteadofscalingupournetwork,wesimplifythenetworkandthenmaketherepresentationeasiertolearn.WepoolavarietyofideasfrompastworkwithourownnovelconceptstoimproveYOLO’sperfor-mance.AsummaryofresultscanbefoundinTable2.BatchNormalization.Batchnormalizationleadstosig-nificantimprovementsinconvergencewhileeliminatingtheneedforotherformsofregularization[7].ByaddingbatchnormalizationonalloftheconvolutionallayersinYOLOwegetmorethan2%improvementinmAP.Batchnormal-izationalsohelpsregularizethemodel.Withbatchnor-malizationwecanremovedropoutfromthemodelwithoutoverfitting.HighResolutionClassifier.Allstate-of-the-artdetec-tionmethodsuseclassifierpre-trainedonImageNet[16].StartingwithAlexNetmostclassifiersoperateoninputim-agessmallerthan256×256[8].TheoriginalYOLOtrainstheclassifiernetworkat224×224andincreasesthereso-lutionto448fordetection.Thismeansthenetworkhastosimultaneouslyswitchtolearningobjectdetectionandad-justtothenewinputresolution.ForYOLOv2wefirstfinetunetheclassificationnetworkatthefull448×448resolutionfor10epochsonImageNet.Thisgivesthenetworktimetoadjustitsfilterstoworkbetteronhigherresolutioninput.Wethenfinetunetheresultingnetworkondetection.Thishighresolutionclassificationnetworkgivesusanincreaseofalmost4%mAP.ConvolutionalWithAnchorBoxes.YOLOpredictsthecoordinatesofboundingboxesdirectlyusingfullycon-nectedlayersontopoftheconvolutionalfeatureextractor.InsteadofpredictingcoordinatesdirectlyFasterR-CNNpredictsboundingboxesusinghand-pickedpriors[15].Us-ingonlyconvolutionallayerstheregionproposalnetwork(RPN)inFasterR-CNNpredictsoffsetsandconfidencesforanchorboxes.Sincethepredictionlayerisconvolutional,theRPNpredictstheseoffsetsateverylocationinafeaturemap.Predictingoffsetsinsteadofcoordinatessimplifiestheproblemandmakesiteasierforthenetworktolearn.WeremovethefullyconnectedlayersfromYOLOanduseanchorboxestopredictboundingboxes.Firstweeliminateonepoolinglayertomaketheoutputofthenet-work’sconvolutionallayershigherresolution.Wealsoshrinkthenetworktooperateon416inputimagesinsteadof448×448.Wedothisbecausewewantanoddnumberoflocationsinourfeaturemapsothereisasinglecentercell.Objects,especiallylargeobjects,tendtooccupythecenteroftheimagesoit’sgoodtohaveasinglelocationrightatthecentertopredicttheseobjectsinsteadoffourlocationsthatareallnearby.YOLO’sconvolutionallayersdownsam-pletheimagebyafactorof32sobyusinganinputimageof416wegetanoutputfeaturemapof13×13.Whenwemovetoanchorboxeswealsodecoupletheclasspredictionmechanismfromthespatiallocationandinsteadpredictclassandobjectnessforeveryanchorbox.FollowingYOLO,theobjectnesspredictionstillpredictstheIOUofthegroundtruthandtheproposedboxandtheclasspredictionspredicttheconditionalprobabilityofthatclassgiventhatthereisanobject.Usinganchorboxeswegetasmalldecreaseinaccuracy.YOLOonlypredicts98boxesperimagebutwithanchorboxesourmodelpredictsmorethanathousand.Withoutanchorboxesourintermediatemodelgets69.5mAPwitharecallof81%.Withanchorboxesourmodelgets69.2mAPwitharecallof88%.EventhoughthemAPdecreases,theincreaseinrecallmeansthatourmodelhasmoreroomtoimprove.DimensionClusters.Weencountertwoissueswithan-chorboxeswhenusingthemwithYOLO.Thefirstisthattheboxdimensionsarehandpicked.Thenetworkcanlearntoadjusttheboxesappropriatelybutifwepickbetterpriorsforthenetworktostartwithwecanmakeiteasierforthenetworktolearntopredictgooddetections.Insteadofchoosingpriorsbyhand,werunk-meansclusteringonthetrainingsetboundingboxestoautomat-0123456789101112131415COCO#ClustersAvgIOU0.75VOC2007Figure2:ClusteringboxdimensionsonVOCandCOCO.Werunk-meansclusteringonthedimensionsofboundingboxestogetgoodpriorsforourmodel.TheleftimageshowstheaverageIOUwegetwithvariouschoicesfork.Wefindthatk=5givesagoodtradeoffforrecallvs.complexityofthemodel.TherightimageshowstherelativecentroidsforVOCandCOCO.Bothsetsofpri-orsfavorthinner,tallerboxeswhileCOCOhasgreatervariationinsizethanVOC.icallyfindgoodpriors.Ifweusestandardk-meanswithEuclideandistancelargerboxesgeneratemoreerrorthansmallerboxes.However,whatwereallywantarepriorsthatleadtogoodIOUscores,whichisindependentofthesizeofthebox.Thusforourdistancemetricweuse:d(box,centroid)=1−IOU(box,centroid)Werunk-meansforvariousvaluesofkandplottheav-erageIOUwithclosestcentroid,seeFigure2.Wechoosek=5asagoodtradeoffbetweenmodelcomplexityandhighrecall.Theclustercentroidsaresignificantlydifferentthanhand-pickedanchorboxes.Therearefewershort,wideboxesandmoretall,thinboxes.WecomparetheaverageIOUtoclosestpriorofourclus-teringstrategyandthehand-pickedanchorboxesinTable1.Atonly5priorsthecentroidsperformsimilarlyto9anchorboxeswithanaverageIOUof61.0comparedto60.9.Ifweuse9centroidsweseeamuchhigheraverageIOU.Thisindicatesthatusingk-meanstogenerateourboundingboxstartsthemodeloffwithabetterrepresentationandmakesthetaskeasiertolearn.BoxGeneration#AvgIOUClusterSSE558.7ClusterIOU561.0AnchorBoxes[15]960.9ClusterIOU967.2Table1:AverageIOUofboxestoclosestpriorsonVOC2007.TheaverageIOUofobjectsonVOC2007totheirclosest,unmod-ifiedpriorusingdifferentgenerationmethods.Clusteringgivesmuchbetterresultsthanusinghand-pickedpriors.Directlocationprediction.WhenusinganchorboxeswithYOLOweencounterasecondissue:modelinstability,especiallyduringearlyiterations.Mostoftheinstabilitycomesfrompredictingthe(x,y)locationsforthebox.Inregionproposalnetworksthenetworkpredictsvaluestxandtyandthe(x,y)centercoordinatesarecalculatedas:x=(tx∗wa)−xay=(ty∗ha)−yaForexample,apredictionoftx=1wouldshifttheboxtotherightbythewidthoftheanchorbox,apredictionoftx=−1wouldshiftittotheleftbythesameamount.Thisformulationisunconstrainedsoanyanchorboxcanendupatanypointintheimage,regardlessofwhatloca-tionpredictedthebox.Withrandominitializationthemodeltakesalongtimetostabilizetopredictingsensibleoffsets.InsteadofpredictingoffsetswefollowtheapproachofYOLOandpredictlocationcoordinatesrelativetotheloca-tionofthegridcell.Thisboundsthegroundtruthtofallbetween0and1.Weusealogisticactivationtoconstrainthenetwork’spredictionstofallinthisrange.Thenetworkpredicts5boundingboxesateachcellintheoutputfeaturemap.Thenetworkpredicts5coordinatesforeachboundingbox,tx,ty,tw,th,andto.Ifthecellisoffsetfromthetopleftcorneroftheimageby(cx,cy)andtheboundingboxpriorhaswidthandheightpw,ph,thenthepredictionscorrespondto:bx=σ(tx)+cxby=σ(ty)+cybw=pwetwbh=phethPr(object)∗IOU(b,object)=σ(to)Sinceweconstrainthelocationpredictiontheparametrizationiseasiertolearn,makingthenetworkmorestable.UsingdimensionclustersalongwithdirectlypredictingtheboundingboxcenterlocationimprovesYOLObyalmost5%overtheversionwithanchorboxes.Fine-GrainedFeatures.ThismodifiedYOLOpredictsdetectionsona13×13featuremap.Whilethisissuffi-cientforlargeobjects,itmaybenefitfromfinergrainedfea-turesforlocalizingsmallerobjects.FasterR-CNNandSSDbothruntheirproposalnetworksatvariousfeaturemapsinthenetworktogetarangeofresolutions.Wetakeadiffer-entapproach,simplyaddingapassthroughlayerthatbringsfeaturesfromanearlierlayerat26×26resolution.Thepassthroughlayerconcatenatesthehigherresolutionfeatureswiththelowresolutionfeaturesbystackingadja-centfeaturesintodifferentchannelsinsteadofspatiallo-cations,similartotheidentitymappingsinResNet.Thisσ(tx)σ(ty)pwphbhbwbw=pwebh=phecxcybx=σ(tx)+cxby=σ(ty)+cytwthFigure3:Boundingboxeswithdimensionpriorsandlocationprediction.Wepredictthewidthandheightoftheboxasoffsetsfromclustercentroids.Wepredictthecentercoordinatesoftheboxrelativetothelocationoffilterapplicationusingasigmoidfunction.turnsthe26×26×512featuremapintoa13×13×2048featuremap,whichcanbeconcatenatedwiththeoriginalfeatures.Ourdetectorrunsontopofthisexpandedfeaturemapsothatithasaccesstofinegrainedfeatures.Thisgivesamodest1%performanceincrease.Multi-ScaleTraining.TheoriginalYOLOusesaninputresolutionof448×448.Withtheadditionofanchorboxeswechangedtheresolutionto416×416.However,sinceourmodelonlyusesconvolutionalandpoolinglayersitcanberesizedonthefly.WewantYOLOv2toberobusttorunningonimagesofdifferentsizessowetrainthisintothemodel.Insteadoffixingtheinputimagesizewechangethenet-workeveryfewiterations.Every10batchesournetworkrandomlychoosesanewimagedimensionsize.Sinceourmodeldownsamplesbyafactorof32,wepullfromthefollowingmultiplesof32:{320,352,...,608}.Thusthesmallestoptionis320×320andthelargestis608×608.Weresizethenetworktothatdimensionandcontinuetrain-ing.Thisregimeforcesthenetworktolearntopredictwellacrossavarietyofinputdimensions.Thismeansthesamenetworkcanpredictdetectionsatdifferentresolutions.ThenetworkrunsfasteratsmallersizessoYOLOv2offersaneasytradeoffbetweenspeedandaccuracy.AtlowresolutionsYOLOv2operatesasacheap,fairlyaccuratedetector.At288×288itrunsatmorethan90FPSwithmAPalmostasgoodasFastR-CNN.ThismakesitidealforsmallerGPUs,highframeratevideo,ormultiplevideostreams.AthighresolutionYOLOv2isastate-of-the-artdetectorwith78.6mAPonVOC2007whilestilloperatingabovereal-timespeeds.SeeTable3foracomparisonofYOLOv2MeanAveragePrecisionFramesPerSecondR-CNNYOLOFastR-CNNFasterR-CNNFasterR-CNNResnetSSD512SSD300YOLOv280706005010030Figure4:AccuracyandspeedonVOC2007.withotherframeworksonVOC2007.Figure4FurtherExperiments.WetrainYOLOv2fordetectiononVOC2012.Table4showsthecomparativeperformanceofYOLOv2versusotherstate-of-the-artdetectionsystems.YOLOv2achieves73.4mAPwhilerunningfarfasterthancompetingmethods.WealsotrainonCOCOandcomparetoothermethodsinTable5.OntheVOCmetric(IOU=.5)YOLOv2gets44.0mAP,comparabletoSSDandFasterR-CNN.3.FasterWewantdetectiontobeaccuratebutwealsowantittobefast.Mostapplicationsfordetection,likeroboticsorself-drivingcars,relyonlowlatencypredictions.InordertomaximizeperformancewedesignYOLOv2tobefastfromthegroundup.MostdetectionframeworksrelyonVGG-16asthebasefeatureextractor[17].VGG-16isapowerful,accurateclas-sificationnetworkbutitisneedlesslycomplex.Thecon-volutionallayersofVGG-16require30.69billionfloatingpointoperationsforasinglepassoverasingleimageat224×224resolution.TheYOLOframeworkusesacustomnetworkbasedontheGooglenetarchitecture[19].ThisnetworkisfasterthanVGG-16,onlyusing8.52billionoperationsforaforwardpass.However,it’saccuracyisslightlyworsethanVGG-16.Forsingle-crop,top-5accuracyat224×224,YOLO’scustommodelgets88.0%ImageNetcomparedto90.0%forVGG-16.Darknet-19.WeproposeanewclassificationmodeltobeusedasthebaseofYOLOv2.Ourmodelbuildsoffofpriorworkonnetworkdesignaswellascommonknowl-edgeinthefield.SimilartotheVGGmodelsweusemostly3×3filtersanddoublethenumberofchannelsafterev-erypoolingstep[17].FollowingtheworkonNetworkinNetwork(NIN)weuseglobalaveragepoolingtomakepre-YOLOYOLOv2batchnorm?XXXXXXXXhi-resclassifier?XXXXXXXconvolutional?XXXXXXanchorboxes?XXnewnetwork?XXXXXdimensionpriors?XXXXlocationprediction?XXXXpassthrough?XXXmulti-scale?XXhi-resdetector?XVOC2007mAP63.465.869.569.269.674.475.476.878.6Table2:ThepathfromYOLOtoYOLOv2.MostofthelisteddesigndecisionsleadtosignificantincreasesinmAP.Twoexceptionsareswitchingtoafullyconvolutionalnetworkwithanchorboxesandusingthenewnetwork.SwitchingtotheanchorboxstyleapproachincreasedrecallwithoutchangingmAPwhileusingthenewnetworkcutcomputationby33%.DetectionFrameworksTrainmAPFPSFastR-CNN[5]2007+201270.00.5FasterR-CNNVGG-16[15]2007+201273.27FasterR-CNNResNet[6]2007+201276.45YOLO[14]2007+201263.445SSD300[11]2007+201274.346SSD500[11]2007+201276.819YOLOv2288×2882007+201269.091YOLOv2352×3522007+201273.781YOLOv2416×4162007+201276.867YOLOv2480×4802007+201277.859YOLOv2544×5442007+201278.640Table3:DetectionframeworksonPASCALVOC2007.YOLOv2isfasterandmoreaccuratethanpriordetectionmeth-ods.Itcanalsorunatdifferentresolutionsforaneasytradeoffbetweenspeedandaccuracy.EachYOLOv2entryisactuallythesametrainedmodelwiththesameweights,justevaluatedatadif-ferentsize.AlltiminginformationisonaGeforceGTXTitanX(original,notPascalmodel).dictionsaswellas1×1filterstocompressthefeaturerep-resentationbetween3×3convolutions[9].Weusebatchnormalizationtostabilizetraining,speedupconvergence,andregularizethemodel[7].Ourfinalmodel,calledDarknet-19,has19convolutionallayersand5maxpoolinglayers.ForafulldescriptionseeTable6.Darknet-19onlyrequires5.58billionoperationstoprocessanimageyetachieves72.9%top-1accuracyand91.2%top-5accuracyonImageNet.Trainingforclassification.WetrainthenetworkonthestandardImageNet1000classclassificationdatasetfor160epochsusingstochasticgradientdescentwithastartinglearningrateof0.1,polynomialratedecaywithapowerof4,weightdecayof0.0005andmomentumof0.9usingtheDarknetneuralnetworkframework[13].Duringtrainingweusestandarddataaugmentationtricksincludingrandomcrops,rotations,andhue,saturation,andexposureshifts.Asdiscussedabove,afterourinitialtrainingonimagesat224×224wefinetuneournetworkatalargersize,448.Forthisfinetuningwetrainwiththeaboveparametersbutforonly10epochsandstartingatalearningrateof10−3.Atthishigherresolutionournetworkachievesatop-1accuracyof76.5%andatop-5accuracyof93.3%.Trainingfordetection.Wemodifythisnetworkforde-tectionbyremovingthelastconvolutionallayerandinsteadaddingonthree3×3convolutionallayerswith1024fil-terseachfollowedbyafinal1×1convolutionallayerwiththenumberofoutputsweneedfordetection.ForVOCwepredict5boxeswith5coordinateseachand20classesperboxso125filters.Wealsoaddapassthroughlayerfromthefinal3×3×512layertothesecondtolastconvolutionallayersothatourmodelcanusefinegrainfeatures.Wetrainthenetworkfor160epochswithastartinglearningrateof10−3,dividingitby10at60and90epochs.Weuseaweightdecayof0.0005andmomentumof0.9.WeuseasimilardataaugmentationtoYOLOandSSDwithrandomcrops,colorshifting,etc.WeusethesametrainingstrategyonCOCOandVOC.4.StrongerWeproposeamechanismforjointlytrainingonclassi-ficationanddetectiondata.Ourmethodusesimagesla-belledfordetectiontolearndetection-specificinformationlikeboundingboxcoordinatepredictionandobjectnessaswellashowtoclassifycommonobjects.Itusesimageswithonlyclasslabelstoexpandthenumberofcategoriesitcandetect.Duringtrainingwemiximagesfrombothdetectionandclassificationdatasets.WhenournetworkseesanimagelabelledfordetectionwecanbackpropagatebasedonthefullYOLOv2lossfunction.Whenitseesaclassificationimageweonlybackpropagatelossfromtheclassification-MethoddatamAPaerobikebirdboatbottlebuscarcatchaircowtabledoghorsembikepersonplantsheepsofatraintvFastR-CNN[5]07++1268.482.378.470.852.338.777.871.689.344.273.055.087.580.580.872.035.168.365.780.464.2FasterR-CNN[15]07++1270.484.979.874.353.949.877.575.988.545.677.155.386.981.780.979.640.172.660.981.261.5YOLO[14]07++1257.977.067.257.738.322.768.355.981.436.260.848.577.272.371.363.528.952.254.873.950.8SSD300[11]07++1272.485.680.170.557.646.279.476.189.253.077.060.887.083.182.379.445.975.969.581.967.5SSD512[11]07++1274.987.482.375.859.052.681.781.590.055.479.059.888.484.384.783.350.278.066.386.372.0ResNet[6]07++1273.886.581.677.258.051.078.676.693.248.680.459.092.185.384.880.748.177.366.584.765.6YOLOv254407++1273.486.382.074.859.251.879.876.590.652.178.258.589.382.583.481.349.177.262.483.868.7Table4:PASCALVOC2012testdetectionresults.YOLOv2performsonparwithstate-of-the-artdetectorslikeFasterR-CNNwithResNetandSSD512andis2−10×faster.0.5:0.950.50.75SML110100SMLFastR-CNN[5]train19.735.9----------FastR-CNN[1]train20.539.919.44.120.035.821.329.530.17.332.152.0FasterR-CNN[15]trainval21.942.7----------ION[1]train23.643.223.66.424.138.323.232.733.510.137.753.6FasterR-CNN[10]trainval24.245.323.57.726.437.123.834.034.612.038.554.4SSD300[11]trainval35k23.241.223.45.323.239.622.533.235.39.637.656.5SSD512[11]trainval35k26.846.527.89.028.941.924.837.539.814.043.559.0YOLOv2[11]trainval35k21.644.019.25.022.435.520.731.633.39.836.554.4Table5:ResultsonCOCOtest-dev2015.Tableadaptedfrom[11]TypeFiltersSize/StrideOutputConvolutional323×3224×224Maxpool2×2/2112×112Convolutional643×3112×112Maxpool2×2/256×56Convolutional1283×356×56Convolutional641×156×56Convolutional1283×356×56Maxpool2×2/228×28Convolutional2563×328×28Convolutional1281×128×28Convolutional2563×328×28Maxpool2×2/214×14Convolutional5123×314×14Convolutional2561×114×14Convolutional5123×314×14Convolutional2561×114×14Convolutional5123×314×14Maxpool2

本文档为【YOLO9000_paper computer vision】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

YOLO9000_paper computer vision

热门搜索

历史搜索