Bigdataisbetterdata0:11America'sfavoritepieis?0:15Audience:Apple.KennethCukier:Apple.Ofcourseitis.Howdoweknowit?Becauseofdata.Youlookatsupermarketsales.Youlookatsupermarketsalesof30-centimeterpiesthatarefrozen,andapplewins,nocontest.Themajorityofthesalesareapple.Butthensupermarketsstartedsellingsmaller,11-centimeterpies,andsuddenly,applefelltofourthorfifthplace.Why?Whathappened?Okay,thinkaboutit.Whenyoubuya30-centimeterpie,thewholefamilyhastoagree,andappleiseveryone'ssecondfavorite.(Laughter)Butwhenyoubuyanindividual11-centimeterpie,youcanbuytheonethatyouwant.Youcangetyourfirstchoice.Youhavemoredata.Youcanseesomethingthatyoucouldn'tseewhenyouonlyhadsmalleramountsofit.1:24Now,thepointhereisthatmoredatadoesn'tjustletusseemore,moreofthesamethingwewerelookingat.Moredataallowsustoseenew.Itallowsustoseebetter.Itallowsustoseedifferent.Inthiscase,itallowsustoseewhatAmerica'sfavoritepieis:notapple.1:49Now,youprobablyallhaveheardthetermbigdata.Infact,you'reprobablysickofhearingthetermbigdata.Itistruethatthereisalotofhypearoundtheterm,andthatisveryunfortunate,becausebigdataisanextremelyimportanttoolbywhichsocietyisgoingtoadvance.Inthepast,weusedtolookatsmalldataandthinkaboutwhatitwouldmeantotrytounderstandtheworld,andnowwehavealotmoreofit,morethanweevercouldbefore.Whatwefindisthatwhenwehavealargebodyofdata,wecanfundamentallydothingsthatwecouldn'tdowhenweonlyhadsmalleramounts.Bigdataisimportant,andbigdataisnew,andwhenyouthinkaboutit,theonlywaythisplanetisgoingtodealwithitsglobalchallenges—tofeedpeople,supplythemwithmedicalcare,supplythemwithenergy,electricity,andtomakesurethey'renotburnttoacrispbecauseofglobalwarming—isbecauseoftheeffectiveuseofdata.2:50Sowhatisnewaboutbigdata?Whatisthebigdeal?Well,toanswerthatquestion,let'sthinkaboutwhatinformationlookedlike,physicallylookedlikeinthepast.In1908,ontheislandofCrete,archaeologistsdiscoveredaclaydisc.Theydateditfrom2000B.C.,soit's4,000yearsold.Now,there'sinscriptionsonthisdisc,butweactuallydon'tknowwhatitmeans.It'sacompletemystery,butthepointisthatthisiswhatinformationusedtolooklike4,000yearsago.Thisishowsocietystoredandtransmittedinformation.3:30Now,societyhasn'tadvancedallthatmuch.Westillstoreinformationondiscs,butnowwecanstorealotmoreinformation,morethaneverbefore.Searchingitiseasier.Copyingiteasier.Sharingitiseasier.Processingitiseasier.Andwhatwecandoiswecanreusethisinformationforusesthatweneverevenimaginedwhenwefirstcollectedthedata.Inthisrespect,thedatahasgonefromastocktoaflow,fromsomethingthatisstationaryandstatictosomethingthatisfluidanddynamic.Thereis,ifyouwill,aliquiditytoinformation.ThediscthatwasdiscoveredoffofCretethat's4,000yearsold,isheavy,itdoesn'tstorealotofinformation,andthatinformationisunchangeable.Bycontrast,allofthefilesthatEdwardSnowdentookfromtheNationalSecurityAgencyintheUnitedStatesfitsonamemorystickthesizeofafingernail,anditcanbesharedatthespeedoflight.Moredata.More.4:50Now,onereasonwhywehavesomuchdataintheworldtodayiswearecollectingthingsthatwe'vealwayscollectedinformationon,butanotherreasonwhyiswe'retakingthingsthathavealwaysbeeninformationalbuthaveneverbeenrenderedintoadataformatandweareputtingitintodata.Think,forexample,thequestionoflocation.Take,forexample,MartinLuther.Ifwewantedtoknowinthe1500swhereMartinLutherwas,wewouldhavetofollowhimatalltimes,maybewithafeatheryquillandaninkwell,andrecordit,butnowthinkaboutwhatitlooksliketoday.Youknowthatsomewhere,probablyinatelecommunicationscarrier'sdatabase,thereisaspreadsheetoratleastadatabaseentrythatrecordsyourinformationofwhereyou'vebeenatalltimes.Ifyouhaveacellphone,andthatcellphonehasGPS,butevenifitdoesn'thaveGPS,itcanrecordyourinformation.Inthisrespect,locationhasbeendatafied.5:47Nowthink,forexample,oftheissueofposture,thewaythatyouareallsittingrightnow,thewaythatyousit,thewaythatyousit,thewaythatyousit.It'salldifferent,andit'safunctionofyourleglengthandyourbackandthecontoursofyourback,andifIweretoputcensors,maybe100censorsintoallofyourchairsrightnow,Icouldcreateanindexthat'sfairlyuniquetoyou,sortoflikeafingerprint,butit'snotyourfinger.6:14Sowhatcouldwedowiththis?ResearchersinTokyoareusingitasapotentialanti-theftdeviceincars.Theideaisthatthecarjackersitsbehindthewheel,triestostreamoff,butthecarrecognizesthatanon-approveddriverisbehindthewheel,andmaybetheenginejuststops,unlessyoutypeinapasswordintothedashboardtosay,"Hey,Ihaveauthorizationtodrive."Great.6:41WhatifeverysinglecarinEuropehadthistechnologyinit?Whatcouldwedothen?Maybe,ifweaggregatedthedata,maybewecouldidentifytelltalesignsthatbestpredictthatacaraccidentisgoingtotakeplaceinthenextfiveseconds.Andthenwhatwewillhavedatafiedisdriverfatigue,andtheservicewouldbewhenthecarsensesthatthepersonslumpsintothatposition,automaticallyknows,hey,setaninternalalarmthatwouldvibratethesteeringwheel,honkinsidetosay,"Hey,wakeup,paymoreattentiontotheroad."Thesearethesortsofthingswecandowhenwedatafymoreaspectsofourlives.7:28Sowhatisthevalueofbigdata?Well,thinkaboutit.Youhavemoreinformation.Youcandothingsthatyoucouldn'tdobefore.Oneofthemostimpressiveareaswherethisconceptistakingplaceisintheareaofmachinelearning.Machinelearningisabranchofartificialintelligence,whichitselfisabranchofcomputerscience.Thegeneralideaisthatinsteadofinstructingacomputerwhatdodo,wearegoingtosimplythrowdataattheproblemandtellthecomputertofigureitoutforitself.Anditwillhelpyouunderstanditbyseeingitsorigins.Inthe1950s,acomputerscientistatIBMnamedArthurSamuellikedtoplaycheckers,sohewroteacomputerprogramsohecouldplayagainstthecomputer.Heplayed.Hewon.Heplayed.Hewon.Heplayed.Hewon,becausethecomputeronlyknewwhatalegalmovewas.ArthurSamuelknewsomethingelse.ArthurSamuelknewstrategy.Sohewroteasmallsub-programalongsideitoperatinginthebackground,andallitdidwasscoretheprobabilitythatagivenboardconfigurationwouldlikelyleadtoawinningboardversusalosingboardaftereverymove.Heplaysthecomputer.Hewins.Heplaysthecomputer.Hewins.Heplaysthecomputer.Hewins.AndthenArthurSamuelleavesthecomputertoplayitself.Itplaysitself.Itcollectsmoredata.Itcollectsmoredata.Itincreasestheaccuracyofitsprediction.AndthenArthurSamuelgoesbacktothecomputerandheplaysit,andheloses,andheplaysit,andheloses,andheplaysit,andheloses,andArthurSamuelhascreatedamachinethatsurpasseshisabilityinataskthathetaughtit.9:29Andthisideaofmachinelearningisgoingeverywhere.Howdoyouthinkwehaveself-drivingcars?Areweanybetteroffasasocietyenshriningalltherulesoftheroadintosoftware?No.Memoryischeaper.No.Algorithmsarefaster.No.Processorsarebetter.No.Allofthosethingsmatter,butthat'snotwhy.It'sbecausewechangedthenatureoftheproblem.Wechangedthenatureoftheproblemfromoneinwhichwetriedtoovertlyandexplicitlyexplaintothecomputerhowtodrivetooneinwhichwesay,"Here'salotofdataaroundthevehicle.Youfigureitout.Youfigureitoutthatthatisatrafficlight,thatthattrafficlightisredandnotgreen,thatthatmeansthatyouneedtostopandnotgoforward."10:17Machinelearningisatthebasisofmanyofthethingsthatwedoonline:searchengines,Amazon'spersonalizationalgorithm,computertranslation,voicerecognitionsystems.Researchersrecentlyhavelookedatthequestionofbiopsies,cancerousbiopsies,andthey'veaskedthecomputertoidentifybylookingatthedataandsurvivalratestodeterminewhethercellsareactuallycancerousornot,andsureenough,whenyouthrowthedataatit,throughamachine-learningalgorithm,themachinewasabletoidentifythe12telltalesignsthatbestpredictthatthisbiopsyofthebreastcancercellsareindeedcancerous.Theproblem:Themedicalliteratureonlyknewnineofthem.Threeofthetraitswereonesthatpeopledidn'tneedtolookfor,butthatthemachinespotted.11:23Now,therearedarksidestobigdataaswell.Itwillimproveourlives,butthereareproblemsthatweneedtobeconsciousof,andthefirstoneistheideathatwemaybepunishedforpredictions,thatthepolicemayusebigdatafortheirpurposes,alittlebitlike"MinorityReport."Now,it'satermcalledpredictivepolicing,oralgorithmiccriminology,andtheideaisthatifwetakealotofdata,forexamplewherepastcrimeshavebeen,weknowwheretosendthepatrols.Thatmakessense,buttheproblem,ofcourse,isthatit'snotsimplygoingtostoponlocationdata,it'sgoingtogodowntotheleveloftheindividual.Whydon'tweusedataabouttheperson'shighschooltranscript?Maybeweshouldusethefactthatthey'reunemployedornot,theircreditscore,theirweb-surfingbehavior,whetherthey'reuplateatnight.TheirFitbit,whenit'sabletoidentifybiochemistries,willshowthattheyhaveaggressivethoughts.Wemayhavealgorithmsthatarelikelytopredictwhatweareabouttodo,andwemaybeheldaccountablebeforewe'veactuallyacted.Privacywasthecentralchallengeinasmalldataera.Inthebigdataage,thechallengewillbesafeguardingfreewill,moralchoice,humanvolition,humanagency.12:53Thereisanotherproblem:Bigdataisgoingtostealourjobs.Bigdataandalgorithmsaregoingtochallengewhitecollar,professionalknowledgeworkinthe21stcenturyinthesamewaythatfactoryautomationandtheassemblylinechallengedbluecollarlaborinthe20thcentury.Thinkaboutalabtechnicianwhoislookingthroughamicroscopeatacancerbiopsyanddeterminingwhetherit'scancerousornot.Thepersonwenttouniversity.Thepersonbuysproperty.Heorshevotes.Heorsheisastakeholderinsociety.Andthatperson'sjob,aswellasanentirefleetofprofessionalslikethatperson,isgoingtofindthattheirjobsareradicallychangedoractuallycompletelyeliminated.Now,weliketothinkthattechnologycreatesjobsoveraperiodoftimeafterashort,temporaryperiodofdislocation,andthatistruefortheframeofreferencewithwhichwealllive,theIndustrialRevolution,becausethat'spreciselywhathappened.Butweforgetsomethinginthatanalysis:Therearesomecategoriesofjobsthatsimplygeteliminatedandnevercomeback.TheIndustrialRevolutionwasn'tverygoodifyouwereahorse.Sowe'regoingtoneedtobecarefulandtakebigdataandadjustitforourneeds,ourveryhumanneeds.Wehavetobethemasterofthistechnology,notitsservant.Wearejustattheoutsetofthebigdataera,andhonestly,wearenotverygoodathandlingallthedatathatwecannowcollect.It'snotjustaproblemfortheNationalSecurityAgency.Businessescollectlotsofdata,andtheymisuseittoo,andweneedtogetbetteratthis,andthiswilltaketime.It'salittlebitlikethechallengethatwasfacedbyprimitivemanandfire.Thisisatool,butthisisatoolthat,unlesswe'recareful,willburnus.14:55Bigdataisgoingtotransformhowwelive,howweworkandhowwethink.Itisgoingtohelpusmanageourcareersandleadlivesofsatisfactionandhopeandhappinessandhealth,butinthepast,we'veoftenlookedatinformationtechnologyandoureyeshaveonlyseentheT,thetechnology,thehardware,becausethat'swhatwasphysical.WenowneedtorecastourgazeattheI,theinformation,whichislessapparent,butinsomewaysalotmoreimportant.Humanitycanfinallylearnfromtheinformationthatitcancollect,aspartofourtimelessquesttounderstandtheworldandourplaceinit,andthat'swhybigdataisabigdeal.15:45(Applause)