Bin Continuous Data
Introduction
Binning is a process of grouping measured data into data classes. These data classes can be further used in various analyses. For example a variable, which takes continuous numerical value, may not be allowed to be selected as input/output variable in certain routines of XLMiner™. So Binning utility addresses this by putting the values of that variable in certain number of bins. The “binned-variable” can be chosen as a categorical variable. The user can decide what values the binned variable should take.
There are following options available for binning-
1. Equal count : In equal count bin interval is decided on the basis of number of records present in the bin variable. In equal count there are following options available :
· Rank : In this option each value in the variable is assigned a rank according to the start and increment value. User can specify the starting and increment value.
· Mean : Mean is calculated as mean of the values lying in bin interval. This mean value is assigned to each value of the variable that lies in that interval.
· Median : Records with the same binning value are counted and median is calculated on the input value. The median value is then assigned to the binned variable.
2. Equal Interval : Equal interval is based on the bin size. In equal interval, the whole range is divided in the user specified bin size. In equal interval the options of Rank and Mid value are available.
See also:
· Using Bin continuous Data in XLMiner™
· Example - Bin Continuous Data
Bin Continuous Data
Using Bin Continuous Data
Let us find out how the Binning utility works:
Select XLMiner --> Data Utilities --> Bin Continuous Data. The following dialog appears on screen.
Select the worksheet. The data range is displayed automatically.
Variable names in the first row : Check this and the list of variables in the dataset will appear. Select the variable to be binned. This means, binning will be done on the values in that column.
Name of the binned variable : If we have chosen x3 as the variable to be binned, XLMiner™ will create a variable, Binned_x3 and store the results of binning in it.
Bins to be made with :
Equal count : The binning is performed such that there are equal number of records in each bin. (There is a possibility that the number of records in a bin may be different here due to some factors like border values, #records being indivisible by the no. of bins etc. We will see this in the examples.)
Equal interval : Depending on the number of bins, the data values are divided among equal intervals. The interval is calculated as (Maximum value -Minimum value)/#bins. The options of Rank and Mid value are available .
Value in the binned variable : Here the user has to specify what value to be assigned to the bin variable for any single bin. Different options are available to assign values to the binned variable, depending upon the method of making bins -- equal count or equal interval.
Rank of the bin : The user can specify the start value. That value will be assigned to the binned variable in the first bin. The binned variable in the second bin will get assigned by (start value + interval) and so on.
Mean of the bin : XLMiner™ calculates the mean of all values in the bin and assigns that value to the binned variable.
Median of the bin : XLMiner™ finds the median of all values in the bin and assigns that value to the binned variable.
Mid Value : The mid value = (Maximum value +Minimum value)/2.
Apply this option to the selected variable : All the options selected are applied to the variable after selecting this.
Click OK and the binned output is displayed in a separate sheet.
See also:
· Introduction To Binning
· Example - Binning
Bin Continuous Data
Examples:
Data Size: Different versions of XLMiner™ have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.
1. Open the dataset Binning_Example.xls.
2. Select XlMiner --> Data Utilities --> Bin Continuous Data. You will get the following dialog.
3. Select a variable below the "Variables" list, say, x3. The options are immediately activated.
4. Make the necessary changes to match the options selected in the following dialog. Select "Apply this option to the selected variable". You will see the selected entry under Name of binned variable.
5. Select OK.
As specified by us earlier, 5 bins are made depending on the count of records and the values of x3 are kept in them. For all the values of x3 lying in one bin, the output variable, Binned_x3 gets a value. (This value is the same for all x3s in one bin). The value to be assigned to Binned_x3 for x3s in first bin is 10, as we specified. Since we have chosen the interval to be 3, the value to be assigned to the values of x3 in the next bin will be 10+3 = 13. You can see above that Binned_x3 takes values 10, 13, 16, 19, 22.
Though we have specified bins to be made with equal count here, the number of records in each interval may not be essentially same. A lot of factors, such as border values, total #records etc influence the number of records lying in the bins. See the binning interval below.
6. Select the Mean of the bin instead of the rank.
In the output Binned_x3 gets the value equal to the mean of all x3s in that bin.
Similarly, if we select the median, Binned_x3 gets the value equal to the median of all x3s in that bin.
7. Let us explore the Equal interval option. The interval is calculated as (Maximum value -Minimum value)/#bins. Let us select x4 as the binning variable.
The output is shown below.
See also:
· Introduction To Data Sampling
· Using Data Sampling