BusinessStatisticsBEO1106WEEK2NUMERICALSUMMARYSTATISTICSReference:Selvanathanetal.(2004),Chapters2,3.NUMERICALSUMMARYMEASURESNumericalsummarymeasuresdescribethemajorpropertiesofadataset,namelyits:centraltendency(orlocation),variability(ordispersion,spread),shape.1.MEASURESOFCENTRALTENDENCYTheylocatethe‘centrality’ofthedataset.Arithmeticmean(variableX)Forthepopulation:(mu)Forasample:x-barAddupthevaluesDividebythenumberofvaluesPopulationsizeSamplesizePropertiesofthemean:Itisthemostcomprehensivemeasureofcentrallocation(i.e.itiscomputedfromallavailabledatavalues);Eachquantitativedatasethasoneandonlyonemean;The(sample)meanisusedextensivelyininferentialstatistics;Itcanbedistortedbyoutliers(orextremevalues).Uncharacteristicallysmallorlargevalues.Median:Themiddlevalueofanorderedarray.Howtofindthemedian‘manually’?Sortthedatafromsmallesttolargest.Choosethemiddlevalueifn(N)isodd,ortaketheaverageofthetwomiddlevaluesifn(N)iseven.Propertiesofthemedian:Eachquantitativedatasethasoneandonlyonemedian;Itisunaffectedbyoutliers;Itiscomputedfromatmosttwodatapoints;Ithaslimitedapplicationandmathematicalpotential.Mode:Themostfrequentlyoccurringvalueofadataset.Propertiesofthemode:Itcanbeusedtodescribebothquantitativeandqualitativedata;Itisunaffectedbyoutliers;Itmightnotbeuniqueoruseful;Ithaslimitedapplicationandmathematicalpotential.2.MEASURESOFVARIABILITY(DISPERSION)Howmuchisthedataspreadoutarounditscentre?Range:largest–smallestPropertiesoftherange:Eachquantitativedatasethasoneandonlyonerange;Itiscomputedfromonlytwodatapoints;Itisaffectedbyoutliers;Ithaslimitedapplicationandmathematicalpotential.Variance:‘average’ofthesquareddeviationsfromthemean.Forthepopulation:2(sigma)Forasample:s2SumofsquareddeviationsdividedbyN,n-1Propertiesofthevariance:Eachquantitativedatasethasoneandonlyonevariance;Itisacomprehensivemeasureofdispersion;Itisaffectedbyoutliers;Itisconceptuallycomplicated;Itishardtointerpretsinceitisgivenin‘squared’unitsoftheobservations.Standarddeviation:‘average’deviationfromthemean,thepositivesquarerootofthevariance.Forthepopulation:Forasample:s2Thestandarddeviationhassimilarpropertiesthanthevariance,butItiseasiertointerpretsinceitisgivenintheoriginalunits;sisusedextensivelyininferentialstatistics.Therange,thevarianceandthestandarddeviationareall‘useless’forcomparingthedispersionsofdatasetsthataremeasuredindifferentunits(e.g.kgandcm),orhavemarkedlydifferentmagnitudes.Coefficientofvariation:thestandarddeviationdividedbythemean.Forthepopulation:Forasample:Propertiesofthecoefficientofvariation:Itmeasuresrelativevariabilitysinceitdoesnotdependontheoriginalunitofmeasurement;Itdoesnotexistwhenthemeaniszero,andcanbemisleadingwhensomeofthevaluesarepositiveandsomeothersarenegative.Inter-quartilerange:IQR=Q3–Q1i.e.therangeofthemiddle50%ofthedata.Propertiesoftheinter-quartilerange:Itisunaffectedbyoutliers;Ithaslimitedapplicationandmathematicalpotential,thoughitcanbeusedtoidentifyoutliers.AnydatapointsmallerthanQ1–1.5×IQRorgreaterthanQ3+1.5×IQRcanbeconsideredanunusuallysmallorlargevalue,i.e.anoutlier(extremevalue).3.DESCRIBINGTHESHAPEOFADATASETTheshapeofadistributionisdescribedbyitsdegreeofsymmetry(skewness)anditspeakedness(Kurtosis).Isthedistributionofadatasetsymmetricalorskewed?Therearethreewaystoanswerthisquestion:Plotthedatausinganhistogramorpolygon.Thedistributionissaidtobeskewed,i.e.notsymmetrical,ifthetailsarenotofthesamelength(approximately).Thedistributionisskewedtotheleft(negativelyskewed),ifthelefttailislongerthantherighttail.Thedistributionisskewedtotheright(positivelyskewed),iftherighttailislongerthanthelefttail.Itindicatesthepresenceofasmallproportionofrelativelysmallvalues.Itindicatesthepresenceofasmallproportionofrelativelylargevalues.ABellshapedSymmetricalHistogramZeroskewnessPositively(orright)skewedNegatively(orleft)skewedComparethemeantothemedian.Threepossibilities:mean=mediansymmetricalmeanmedianskewedtotherightComputetheskewnessmeasureusingMSExcel.Isit(approximately)zero(symmetrical),negative(skewedtotheleft),orpositive(skewedtotheright)?ComputetheKurtosisvalueusingMSExcel.Isit(approximately)zero(bellshape),negative(lesspeaked),orpositive(morepeaked)?Hasthedistributionofthedatasetabellshape,orisitmoreorlesspeaked?KurtosisApeakeddistribution–PositiveKurtosisKurtosisAflatdistribution–NegativeKurtosisEx4:Weconsiderthepricetoearningsratioandthedividendyieldfor20listedshares.ThedatawasdownloadedfromSelvanathanCase3.1andsummarisedusingMSExcel.Wegetthefollowingresults:Mean:Forthe20listedsharestheaverageP/Eratiois15.3,andthedividendyieldis4.4%.Median:50%oftheshareshaveP/Eratioslessthan13.9andtheother50%haveP/Eratiosmorethan13.9.50%oftheshareshavedividendyieldslessthan4.4andtheother50%havedividendyieldsmorethan13.9.Skewness:ThemeanP/Eratioislargerthanthemedian,andsoitsdistributionispositivelyskewed.Note,skewnessispositive1.8.Themeandividendyieldisthesameasthemedian,andsoitsdistributionissymmetrical.Note,skewnessispositiveveryclosetozero.Ex4Continued:Standarddeviation:TheaveragedeviationofP/Eratiosfromthemeanismeasuredas5.4,andthatofdividendyieldas1.8.Range:TherangeofP/Eratiosis21.1andtherangeofdividendyieldsis7.4Kurtosis:ThedistributionofP/Eratiosispeaked(Kurtosis=3.0)whilethatofthedividendyieldsisalmostsymmetrical(Kurtosis=0.4).Coefficientofvariation:TheaveragedeviationofP/Eratiosfromitsmeanis35.29%ofthemeanP/Eratio,andtheaveragedeviationofdividendyieldsfromitsmeanis40.91%ofthemeandividendyield.ThoughthestandarddeviationandtherangeshowsthatP/Eratiohasagreateraveragedeviationfromthemeanthanthedividendyield,thecvshowstheoppositefordeviationsrelativetothemean.The(Q1–1.5×IQR;Q3+1.5×IQR)intervalfortheP/EratioiscalculatedasfollowsusingMSExcel:i.e.anydatapointoutsidethisintervalcanbeconsideredanoutlier.Identifyingextremevalues(outliers)EMPIRICALRULEIfasampleofmeasurementshasamound-shapeddistribution,i.e.amoreorlesssymmetricaldistributionwithasinglemode,theintervalcontainsabout68%ofallmeasurements,containsabout95%ofallmeasurements,containsalloravastmajorityofmeasurements.Anyvalueoutsidethethird(oreventhesecond)intervalisanoutlier.Inexample4,thedistributionofdividendyieldswasmound-shaped,soletuscalculatethethirdintervalforthisdistribution:Weexpectalmostallobservationstobewithinthisinterval.Therefore,ifanobservationhappenstobeoutsidethisinterval,wecanconsideritanoutlier.