Getting Started in Frequencies, Crosstab,
Factor and Regression Analysis
(ver. 2.0 beta, draft)
Oscar Torres-Reyna
Data Consultant
otorres@princeton.edu
http://dss.princeton.edu/training/
Case study: intro
Search here in the
home page for this
dataset
Codebook in two
formats
Datasets, two
formats: ACII and
SPSS
Marginals
Metadata
NOTE: When data is
not available in
Stata, you can
download the SPSS
portable (*.por),
open it using SPSS
(available at the
DSS lab) and saving
it as Stata.
Total 1,053 100.00
Female 552.611604 52.48 100.00
Male 500.388396 47.52 47.52
ASK) Freq. Percent Cum.
(DO NOT
A. Gender
. tab qa [aweight=weight] /*With weights*/
.
Total 1,053 100.00
Female 560 53.18 100.00
Male 493 46.82 46.82
ASK) Freq. Percent Cum.
(DO NOT
A. Gender
. tab qa /*No weights*/
.
Total 1,053 100.00
(VOL) Undecided/Don't know/no answer 78.61762284 7.47 100.00
(VOL) Other/Neither 20.5570831 1.95 92.53
John McCain and Sarah Palin, the Republ 449.487545 42.69 90.58
Barack Obama and Joe Biden, the Democra 504.337749 47.90 47.90
Barack Freq. Percent Cum.
held today and the candidates were
Q5. If the Presidential election were
. tab q5 [aweight=weight] /*With weights*/
.
Total 1,053 100.00
(VOL) Undecided/Don't know/no answer 87 8.26 100.00
(VOL) Other/Neither 21 1.99 91.74
John McCain and Sarah Palin, the Republ 464 44.06 89.74
Barack Obama and Joe Biden, the Democra 481 45.68 45.68
Barack Freq. Percent Cum.
held today and the candidates were
Q5. If the Presidential election were
. tab q5 /*No weights*/
Case study: frequencies
Distribution of electoral preferences and gender. According to the codebook
‘q5’ has the electoral question and ‘qa’ gender.
NOTE: At this point, it is strongly
recommended to open a log to keep a
record of your work and to extract output,
type:
log using mywork.log
You could also open a do-file by typing
doedit and copy your commands there.
No weights
Using weights
No weights
Using weights
100.00 100.00 100.00
47.52 52.48 100.00
Total 500.3884 552.6116 1,053
5.59 9.16 7.47
35.59 64.41 100.00
(VOL) Undecided/Don't 27.980574 50.637048 78.617623
2.01 1.90 1.95
48.92 51.08 100.00
(VOL) Other/Neither 10.055739 10.5013441 20.557083
50.55 35.57 42.69
56.27 43.73 100.00
John McCain and Sarah 252.9313 196.55625 449.487545
41.85 53.37 47.90
41.52 58.48 100.00
Barack Obama and Joe 209.42078 294.91697 504.33775
Barack Male Female Total
the candidates were ASK)
were held today and A. Gender (DO NOT
Presidential election
Q5. If the
column percentage
row percentage
frequency
Key
. tab q5 qa [aw=weight], col row /*Electoral preferences by gender*/
Case study: Electoral preferences by gender
Case study: Electoral preferences by age
100.00 100.00 100.00 100.00 100.00 100.00 100.00
3.59 4.97 9.45 8.59 9.76 23.80 16.26
Total 37.845325 52.312241 99.540836 90.454747 102.7289 250.600407 171.24932
5.99 7.05 1.82 5.08 8.33 7.16 7.75
2.88 4.69 2.30 5.84 10.88 22.84 16.87
(VOL) Undecided/Don't 2.2672181 3.6879373 1.809561 4.5920698 8.5570854 17.952531 13.264407
0.00 0.00 2.13 2.70 4.39 1.25 1.62
0.00 0.00 10.32 11.88 21.96 15.25 13.52
(VOL) Other/Neither 0 0 2.1209543 2.4419715 4.51458561 3.1358789 2.7783459
16.44 42.42 55.14 40.71 49.69 39.90 40.31
1.38 4.94 12.21 8.19 11.36 22.25 15.36
John McCain and Sarah 6.2229886 22.18839 54.883049 36.825588 51.046351 99.992283 69.037199
77.57 50.53 40.92 51.51 37.59 51.68 50.32
5.82 5.24 8.08 9.24 7.66 25.68 17.09
Barack Obama and Joe 29.355119 26.435913 40.727272 46.595118 38.610873 129.51971 86.169373
Barack 18-24 25-29 30-34 35-39 40-44 45-54 55-64
the candidates were F1. What is your age?
were held today and
Presidential election
Q5. If the
column percentage
row percentage
frequency
Key
. tab q5 f1 [aw=weight], col row /*Electoral preferences by age*/
100.00 100.00 100.00
22.50 1.08 100.00
236.93948 11.328748 1,053
10.22 20.01 7.47
30.81 2.88 100.00
24.219596 2.2672179 78.617623
2.35 0.00 1.95
27.07 0.00 100.00
5.56534701 0 20.557083
44.21 39.98 42.69
23.31 1.01 100.00
104.76215 4.5295414 449.487545
43.21 40.00 47.90
20.30 0.90 100.00
102.39238 4.5319886 504.33775
65 or old (VOL) No Total
F1. What is your age?
100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
0.53 1.10 17.48 23.74 35.84 20.78 0.54 100.00
Total 5.57103 11.538985 184.08111 249.97781 377.3726 218.79775 5.6607032 1,053
0.00 13.34 11.95 6.55 7.36 4.99 0.00 7.47
0.00 1.96 27.99 20.82 35.34 13.90 0.00 100.00
(VOL) Undecided/Don't 0 1.5397725 22.004128 16.367784 27.7818421 10.924096 0 78.617623
0.00 0.00 2.03 1.35 2.62 1.62 0.00 1.95
0.00 0.00 18.19 16.45 48.12 17.24 0.00 100.00
(VOL) Other/Neither 0 0 3.7389017 3.382658 9.8911577 3.5443656 0 20.557083
58.73 53.00 41.69 46.68 45.13 33.86 39.97 42.69
0.73 1.36 17.07 25.96 37.89 16.48 0.50 100.00
John McCain and Sarah 3.2718681 6.1159475 76.7484051 116.69213 170.30303 74.093841 2.2623235 449.487545
41.27 33.65 44.32 45.42 44.89 59.52 60.03 47.90
0.46 0.77 16.18 22.51 33.59 25.82 0.67 100.00
Barack Obama and Joe 2.2991619 3.883265 81.589679 113.53524 169.39657 130.23545 3.3983797 504.33775
Barack 8th grade Some high High scho Some coll College g Postgradu (VOL) No Total
the candidates were F4. What is the highest grade of schooling that you've completed?
were held today and
Presidential election
Q5. If the
column percentage
row percentage
frequency
Key
. tab q5 f4 [aw=weight], col row /*Electoral preferences by education*/
Case study: Electoral preferences by educational attainment
100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
5.90 9.83 14.21 20.27 14.50 14.35 10.92 10.02 100.00
Total 62.109961 103.52815 149.64713 213.49343 152.69863 151.06636 114.97061 105.48574 1,053
7.16 11.34 6.22 8.61 3.22 6.23 6.10 12.70 7.47
5.66 14.93 11.85 23.38 6.26 11.97 8.93 17.04 100.00
(VOL) Undecided/Don't 4.4480018 11.739914 9.3136182 18.37691 4.9181423 9.409895 7.01703324 13.3941079 78.617623
2.42 0.85 2.14 1.17 1.39 2.04 1.91 4.79 1.95
7.33 4.30 15.60 12.17 10.33 14.99 10.70 24.59 100.00
(VOL) Other/Neither 1.5060026 .88321203 3.2060684 2.5018142 2.1243815 3.0806277 2.200355 5.0546217 20.557083
30.00 38.41 43.04 32.71 56.34 45.57 47.53 44.88 42.69
4.14 8.85 14.33 15.53 19.14 15.32 12.16 10.53 100.00
John McCain and Sarah 18.630762 39.764056 64.4115908 69.827216 86.023642 68.843117 54.640308 47.346852 449.487545
60.42 49.40 48.59 57.51 39.05 46.16 44.46 37.63 47.90
7.44 10.14 14.42 24.35 11.82 13.83 10.13 7.87 100.00
Barack Obama and Joe 37.525195 51.14097 72.715849 122.78749 59.632459 69.732723 51.1129092 39.690155 504.33775
Barack Less than $20,000 t $35,000 t $50,000 t $75,000 t $100,000 or $150,0 (VOL) No Total
the candidates were F13. Finally, just for classification purposes, was your total family income bef
were held today and
Presidential election
Q5. If the
column percentage
row percentage
frequency
Key
. tab q5 f13 [aw=weight], col row /*Electoral preferences by income*/
Case study: Electoral preferences by income
100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
52.87 6.75 2.50 25.83 1.79 5.80 3.86 0.60 100.00
Total 556.67476 71.08488 26.3172676 272.02643 18.8805338 61.0719 40.639858 6.304371 1,053
5.31 9.48 9.40 10.22 12.01 11.85 6.21 0.00 7.47
37.60 8.57 3.15 35.38 2.88 9.21 3.21 0.00 100.00
(VOL) Undecided/Don't 29.558151 6.7386098 2.4747578 27.814172 2.2672181 7.2399743 2.52474 0 78.617623
2.07 2.33 0.00 2.25 0.00 0.00 3.18 0.00 1.95
55.94 8.04 0.00 29.74 0.00 0.00 6.29 0.00 100.00
(VOL) Other/Neither 11.498793 1.6530186 0 6.1126834 0 0 1.2925883 0 20.557083
45.33 36.19 23.37 41.39 5.97 60.89 29.83 35.88 42.69
56.13 5.72 1.37 25.05 0.25 8.27 2.70 0.50 100.00
John McCain and Sarah 252.31686 25.723928 6.1500438 112.5963 1.1268505 37.187532 12.123702 2.2623235 449.487545
47.30 52.01 67.23 46.14 82.02 27.25 60.77 64.12 47.90
52.21 7.33 3.51 24.88 3.07 3.30 4.90 0.80 100.00
Barack Obama and Joe 263.30095 36.9693237 17.692466 125.50328 15.486465 16.644394 24.6988275 4.0420475 504.33775
Barack Employed Employed Laid off Retired Student Homemaker Something (VOL) No Total
the candidates were f8
were held today and
Presidential election
Q5. If the
column percentage
row percentage
frequency
Key
. tab q5 f8 [aw=weight], col row /*Electoral preferences by employment status*/
Case study: Electoral preferences by employment status
Case study: Testing for associations (preparing the data)
Before running any test we need to prepare the data by setting to missing any non-valid response (like
“don’t know/no answer/not sure”) unless is relevant to the question. It is important to ‘clean’ the variables
for the tests to be as accurate as possible. For demographics we will remove non-response items. Here are
a series of commands per variable (columns) to prepare some variables for you to run on your own.
Description Age Education Income Employment Gender
creating a new
variable gen age=f1 gen educ=f4 gen income=f13 gen employ=f8 gen gender=qa
exploring the new
variable tab age tab educ tab income tab employ tab gender
checking for labels
from original variable labelbook f1 labelbook f4 labelbook f13 labelbook f8 labelbook qa
assigning labels to
new variable label value age f1 label value educ f4 label value income f13 label value employ f8
label value
gender qa
exploring the new
variable tab age tab educ tab income tab employ tab gender
setting no response to
missing
replace age=. if
age>8 replace educ=. if educ==8
replace income=. if
income==8 replace employ=. if employ==8
adding variable labels label variable age "Age"
label variable educ "Educational
attainment"
label variable income
"Family income"
label variable employ
"Employment status"
exploring the new
variable tab age tab educ tab income tab employ
Case study: Testing for associations (preparing the data –cont.)
Here is an easy way to do it by using the command clonevar in Stata.
Description Age Education Income Employment Gender
creating a new
variable
clonevar
age=f1
clonevar
educ=f4
clonevar
income=f13 clonevar employ=f8
clonevar
gender=qa
exploring the new
variable tab age tab educ tab income tab employ tab gender
setting no
response to
missing
replace age=.
if age>8
replace educ=.
if educ==8
replace income=. if
income==8
replace employ=. if
employ==8
exploring the new
variable tab age tab educ tab income tab employ
Case study: testing for associations
To find whether there is some association between demographics and electoral
preferences we can use chi-square but first we need to ‘clean’ the electoral variable (q5).
Lets create a new variable ‘elec’ from ‘q5’. We will use recode for this, type:
Original variable
Value 1=1 with label in quotes
Value 2=2 with label in quotes
Values 3, 4 & 8 = 3 with label in quotes
New variable, name in parenthesis
Labels are saved as ‘elec’
Here is the new variable
We use the ‘nofreq’ option after comma since we
are not interested on the crosstabulations but rather
on the tests. We can see that gender, education,
income and employment status are somehow
associated with electoral preferences. Age does not
seem to have any association.
Total 1,053 100.00
Undecided/DK/NA/Other 108 10.26 100.00
McCain/Palin 464 44.06 89.74
Obama/Biden 481