MDDE+602+Module+3

=Module #3=

Module 1 Module 2 Module 4 Glossary

Textbook Readings
Week 8 Collecting and analyzing quantitative data > Neuman, Chapter12 Week 9 > None Week 10 Collecting and analyzing quantitative data > Neuman, Chapter 15 Week 11 > None

Articles to Read
Week 8 Collecting and analyzing quantitative data > None Week 9 > None Week 10 Collecting and analyzing quantitative data > None Week 11 > None

Recommended Readings
Rowntree, D. (1981). //Statistics without tears: A primer for non-mathematicians.// New York: Penguin. (Module 3, unit 1.)

Faculty of Education, University of Alberta. Understanding statistics. Instructional CD. (Module 3, unit 1.)


 * Week 8 Collecting and analyzing quantitative data**

1. What is a codebook and how is it used in research? (p. 344) > details of the coding procedure, or if you misplace the codebook, you have lost the key to the data and may have to recode the data again.
 * Chapter 12**
 * A document that describes the procedure for coding variables and their location in a format that computers can use.
 * When you code data, it is very important to create a well-organized, detailed codebook and make multiple copies of it. If you do not write down the

2. How do researchers clean data and check their coding? (p. 346) >> For example, education is cross-classified by occupation. If a respondent is recorded as never having passed the eighth grade and also is recorded as being a legitimate medical doctor, the researcher checks for a coding error.
 * After very careful coding, the researcher checks the accuracy of coding, or "cleans" the data. He or she may code a 10 to 15 percent random sample of the data a second time. If no coding errors appear, the researcher proceeds; if he or she finds errors, the researcher rechecks all coding.
 * Researchers verify coding after the data are in a computer in two ways.
 * **Possible code cleaning** (or wild code checking) involves checking the categories of all variables for impossible codes. For example, respondent sex is coded 1 = Male, 2 = Female. Finding a 4 for a case in the field for the sex variable indicates a coding error.
 * **contingency cleaning** (or consistency checking), involves cross-classifying two variables and looking for logically impossible combinations.

3. Describe how researchers use the optical scan sheets. (p. 346/7)
 * like our attendance at high school, or multiple choice test exams.

4. In what ways can a researcher display frequency distribution information? (p. 347/8) > (link to graphic and definitions)
 * **Frequency distribution** is a table that shows the distribution of cases into the categories of one variable, that is, the number or percent of cases in each category.
 * There are many ways to display frequency distribution information some examples are as raw count frequency distribution, percentage frequency distribution, bar chart frequency distribution, grouped data frequency distribution, or a frequency distribution polygon.
 * In **nominal** measurement the numerical values just "name" the attribute uniquely. No ordering of the cases is implied. For example, jersey numbers in basketball are measures at the nominal level. A player with number 30 is not more of anything than a player with number 15, and is certainly not twice whatever number 15 is.
 * In **ordinal** measurement the attributes can be rank-ordered. Here, distances between attributes do not have any meaning. For example, on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean //more// education. But is distance from 0 to 1 same as 3 to 4? Of course not. The interval between values is not interpretable in an ordinal measure.
 * In **interval** measurement the distance between attributes //does// have meaning. For example, when we measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between values is interpretable. Because of this, it makes sense to compute an average of an interval variable, where it doesn't make sense to do so for ordinal scales. But note that in interval measurement ratios don't make any sense - 80 degrees is not twice as hot as 40 degrees (although the attribute value is twice as large).
 * Finally, in **ratio** measurement there is always an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social research most "count" variables are ratio, for example, the number of clients in past six months. Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months."
 * It's important to recognize that there is a hierarchy implied in the level of measurement idea. At lower levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a lower one (nominal or ordinal).

5. Describe the differences between mean, median, and mode. (p. 349/50)
 * Mean: can only be used with interval or ratio level data. The mean is strongly affected by changes in extreme values (very large or very small).
 * In general, the median is best for skewed distributions, although the mean is used in most other statistics.
 * Median: middle point. It is also the 50th percentile, or the point at which half the cases are above it and half below it. It can be used with ordinal-, interval- or ratio-level data (but not nominal level). Note that the median does not change easily.
 * Mode: can be used with nominal, ordinal, interval, or ratio data. It is simply the most common or frequently occurring number. There will always be at least one case with a score that is equal to the mode.


 * If the frequency distribution forms a, normal distribution or bell-shaped curve, the three measures of central tendency equal each other.
 * If most cases have lower scores with a few extreme high scores, the mean will he the highest, the median in the middle, and the mode the lowest.
 * If most cases have higher scores with a few extreme low scores, the mean will be the lowest. the median in the middle, and the mode the highest.

6. What three features of a relationship can he seen from a scattergram? (p. 355/6)
 * A **scattergram** is a graph on which a researcher plots each case or observation. where each axis represents the value of one variable. It is used for variables measured at the interval or ratio level, rarely for ordinal variables, and never if either variable is nominal
 * Form: Relationships can take three forms: independence, linear, and curvilinear.
 * Independence or no relationship is the easiest to see. It looks like random scatter with no pattern, or a straight line that is exactly parallel to the horizontal or vertical axis.
 * A linear relationship means that a straight line can be visualized in the middle of a maze of cases running from one corner to another.
 * A curvilinear relationship means that the center of a maze of cases would form it U curve, right side up or upside down, or an S curve.
 * Direction:Linear relationships can have a positive or negative direction.
 * Precision:Precision is the amount of spread in the points on the graph.
 * A high level of precision occurs when the points hug the line that summarizes the relationship.
 * A low level occurs when the points are widely spread around the line.

7. What is a covariation and how is it used? (p. 353)
 * Covariation is the idea that two variables vary together, such that knowing the values on one variable provides information about values found on another.
 * For example, people with higher values on the income variable are likely to have higher values on the life expectancy variable. Likewise, those with lower incomes have lower life expectancy. This is usually stated in a shorthand way by saying that income and life expectancy are related to each other, or covary. We could also say that knowing one's income tells us one's probable life expectancy, or that life expectancy depends on income.
 * Most researchers state hypotheses in terms of a causal relationship or expected covariation: if they use the null hypothesis, the hypothesis is that there is independence. It is used in formal hypothesis testing and is frequently found in inferential statistics.

8. When can a researcher generalize from a scattergram to a percentaged table to find a relationship among variables? (p. 361) > same order as in a scattergram. In scattergrams the lowest variable categories begin at the bottom left. If the categories in a table are not ordered the same way, the rule does not work.
 * The circle-the-largest-cell rule works-with one important caveat. The categories in the percentages table must he ordinal or interval and in the
 * If there is no relationship in a table, the cell percentages look approximately equal across rows or columns.
 * A linear relationship looks like larger percentages in the diagonal cells.
 * If there is a curvilinear relationship, the largest percentages for a a pattern across cells. For example, the largest cells might be the upper right, the bottom middle, and the upper left.

9. Discuss the concept of control as it is used in trivariate analysis. (p. 265-9) > control variables using trivariate or three-variable tables.
 * In order to meet all the conditions needed for causality, researchers want to "control for" or see whether an alternative explanation explains away a causal relationship. If an alternative explanation explains a relationship, then the trivariate relationship is spurious (false). Alternative explanations are operationalized as third variables, which are called control variables because they control for alternative explanation.
 * One way to take such third variables into consideration and see whether they influence the trivariate relationship is to statistically introduce
 * **Elaboration paradigm** A system for describing patterns evident among tables when the bivariate contingency table is compared with partials after the control variable has been added.
 * Different patterns can emerge from the elaboration paradigm
 * **Replication pattern** A pattern in the elaboration paradigm in which the partials show the //same relationship// as in a bivariate contingency table of the independent and dependent variable alone.
 * **Specification pattern** A pattern in the elaboration paradigm in which the bivariate contingency table shows a relationship. //One of the partial tables shows the relationship, but other tables do not.//
 * **Interpretation pattern** A pattern in the elaboration paradigm in which the bivariate contingency table shows a relationship, but the partials show //no relationship// and the control variable is //intervening// in the causal explanation.
 * **Explanation pattern** A pattern in the elaboration paradigm in which the bivariate contingency table shows a relationship, but the partials show //no relationship// and the control variable //occurs prior// to the independent variable.
 * **Suppressor variable pattern** A pattern in the elaboration paradigm in which //no relationship// appears in a bivariate contingency table, but the //partials show a relationship// between the variables. The control variable is a suppressor variable because it suppressed the true relationship. The true relationship appears in the partials.

10. What does it mean to say "statistically significant at the 0.001 level," and what type of error is more likely: Type I or Type II? (p. 307-) > probable. Statistical significance is not the same as practical, substantive, or theoretical significance. Results can be statistically significant but theoretically meaningless or trivial.
 * Statistical significance means that results are not likely to be due to chance factors.
 * Statistical significance tells only what is likely. It cannot prove anything with absolute certainty. It states that particular outcomes are more or less
 * If a researcher says that results are significant at the 0.001 level, this means the following:
 * Results like these are due to chance factors only 1 in 1000 times.
 * There is a 99.9%chance that the sample results are not due to chance factors alone. but reflect the population accurately.
 * The odds of such results based on chance alone are .001, or 0.1%.
 * One can be 99.9% confident that the results are due to a real relationship in the population, not chance factors.
 * If the researcher attributes the results to such a high chance (0.1%) that for the results to occur they are quite rare--such a high standard means that the researcher is more likely to make a mistake by saying the results are due to chance when in fact they are not. They might falsely accept a relationship when in fact none exists (Type II error).
 * **Type I error** occurs when the researcher says that a relationship exists when in fact none exists**.**The logical error of falsely rejecting the null hypothesis.
 * **Type II error** occurs when a researcher says that a relationship does not exist, when in fact it does. The logical error of falsely accepting the null hypothesis.
 * For example, the researcher might use the . 0001 level. He or she attributes the results to chance unless they are so rare that they would occur by chance only I in 10,000 times. Such a high standard means that the researcher is most likely to err by saying results are due to chance when in fact they are not. He or she may falsely accept the null hypothesis when there is a causal relationship (a Type II error ).
 * By contrast, a risk - taking researcher sets a low level of significance, such as . 10. His or her results indicate a relationship would occur by chance I in 10 times. He or she is likely to err by saying that a causal relationship exists, when in fact random factors (e.g., random sampling error ) actually cause the results.The researcher is likely to falsely reject the null hypothesis (Type I error).
 * In sum, the .05 level is a compromise between Type I and Type II errors.


 * Week 10 Collecting and analyzing qualitative data**

Neuman - Chapter 15

1. Identify four differences between quantitative and qualitative data analysis. (p. 458/459)
 * 1) Quantitative data analysis is more **standardized**; hypothesis testing and statistical methods are similar across different social research projects or across the natural and social science. Whereas, qualitative data analysis is **less standardized**. The wide variety in qualitative research is matched by the many approaches to data analysis.
 * 2) Quantitative researchers do not begin **data analysis** until they have **collected** all of the data and condensed them into numbers. They then manipulate the numbers in order to see patterns or relationships. Qualitative researchers look for patterns or relationships, early in a research project, while they are **still collecting data**. The results of early data analysis guide subsequent data collection. Thus, analysis is less a distinct final stage of research than a dimension of research that stretches across all stages.
 * 3) Another difference is the **relation to social theory**. Quantitative researchers manipulate numbers that represent empirical facts in order to **test an abstract hypothesis** with variable constructs. By contrast, qualitative researchers **create new concepts and theory** by blending together empirical evidence and abstract concepts. Instead of testing a hypothesis, a qualitative analyst may illustrate or color in evidence showing that a theory, generalization, or interpretation is plausible.
 * 4) In quantitative analysis, data analysis is clothed in statistics, hypotheses, and variables. Quantitative researchers assume that social life can be measured by using numbers, then manipulate the numbers with statistics to reveal features of social life. Qualitative analysis does **not** draw on a large, well-established body of formal knowledge **from mathematics and statistics**. The data are relatively imprecise, diffuse, and context- based, and can have more than one meaning . This is not seen as a disadvantage.

2. How does the process of conceptualization differ in qualitative and quantitative research? (p. 460)
 * Quantitative researchers conceptualize variables and refine concepts as part of the process of measuring variables.
 * Qualitative researchers form new concepts or refine concepts that are grounded in the data. Concept formation is an integral part of data analysis and begins during data collection. Thus, conceptualization is one way that a qualitative researcher organizes and makes sense of data.

3. How does data coding differ in quantitative and qualitative research, and what are the three kinds of coding used by a qualitative researcher? (p. 460)
 * Coding Video on YouTube: I've got some interview data! What next?
 * When quantitative researcher codes data, he or she arranges measures of variables into it machine-readable form for statistical analysis--a clerical data management task.
 * A researcher organizes the raw data into conceptual categories and creates themes or concepts. It is guided by the research question and leads to new questions. it frees a researcher from entanglement in the details of the raw data and encourages higher-level thinking about them. It also moves him or her toward theory and generalizations.
 * 1) **Open coding** A first coding of qualitative data in which a researcher examines the data to condense them into preliminary analytic categories or codes. (p. 461)
 * 2) **Axial coding** A second stage of coding of qualitative data in which a researcher organizes the codes, links them, and discovers key analytic categories. During axial coding. ask about causes and consequences, conditions and interactions. strategies and processes, and look for categories or concepts that cluster together. (p. 462)
 * You should ask questions such as:
 * Can I divide existing concepts into subdimensions or subcategories?
 * Can I combine several closely related concepts into one more general one?
 * Can I organize categories into a sequence (i.e.. A. then B, then C), or by their physical location (i.e., where they occur), or their relationship to a major topic of interest?
 * 1) **Selective coding** A last stage in coding qualitative data in which a researcher examines previous codes to identify and select data that will support the conceptual coding categories that were developed. (p. 464) Selective coding is the process of choosing one category to be the core category, and relating all other categories to that category. The essential idea is to develop a single storyline around which all everything else is draped. There is a belief that such a core concept always exists. (link)

4. What is the purpose of analytic memo writing in qualitative data analysis? (p. 464/5)
 * **Analytic memos** are notes that a qualitative researcher takes while developing more abstract ideas, themes, or hypotheses from an examination of details in the data.
 * Each coded theme or concept forms the basis of it separate memo. and the memo contains a discussion of the concept or theme. Rough theoretical ideas form the beginning of analytic memos.
 * The analytic memo forges a link between theconcrete data or raw evidence and more abstract, theoretical thinking. It contains your reflections on and thinking about the data and coding. Add to the memo and use it as you pass through the data with each type of coding. The memos form the basis for analyzing data in the research report. In fact, rewritten sections from good quality analytic memos can become sections of the final report.

5. Describe successive approximation. (p. 469)
 * This method involves repeated iterations or cycling through steps, moving toward a final analysis. Over time, or after several iterations, it researcher moves from vague ideas and concrete details in the data toward a comprehensive analysis with generalizations. This is similar to coding discussed earlier.
 * A researcher begins with research questions and a framework of assumptions and concepts. He or she then probes into the data, asking questions of the evidence to see how well the concepts tit the evidence and reveal features of the data. He or she also creates new concepts by abstracting from the evidence and adjusts concepts to fit the evidence better. The researcher then collects additional evidence to address unresolved issues that appeared in the first stage, and repeats the process. At each stage, the evidence and the theory shape each other. This is called successive approximation because the modified concepts and the model approximate the full evidence and are modified over and over to become successively more accurate.
 * Each pass through the evidence is provisional or incomplete. The concepts are abstract, but they are rooted in the concrete evidence and reflect the context. As the analysis moves toward generalizations that are subject to conditions and contingencies, the researcher refines generalizations and linkages to reflect the evidence better.'

6. What are the empty boxes in the illustrative method and how are they used? (p 469)
 * Illustrative method is a method of qualitative data analysis in which a researcher takes the theoretical concepts and treats them as empty boxes to be filled with specific empirical examples and descriptions.
 * With the illustrative method, a researcher applies theory to a concrete historical situation or social setting, or organizes data on the basis of prior theory. Preexisting theory provides the **empty boxes**. The researcher sees whether evidence can be gathered to fill them. The evidence in the boxes confirms or rejects the theory, which he or she treats as a useful device for interpreting the social world. The theory can be in the form of a general model, an analogy, or a sequence of steps.
 * A single case study with the illustrative method does not permit a strong test or verification of an explanation. "This is because data from one case can illustrate the empty boxes from several competing explanations. In addition. finding evidence to illustrate an empty box using one case does not build a generalized explanation. A general explanation requires evidence from numerous cases.
 * Case Clarification: The theoretical model illuminates or clarifies a specific case or single situation. The case becomes understandable by applying the theory to it.
 * Parallel Demonstration: A researcher juxtaposes multiple cases to show that the theory operates in multiple cases. The researcher can illustrate theory with specific material from multiple cases.
 * Pattern Matching: A researcher matches the observations from one case with the pattern or concepts derived from theory or other studies. It allows for partial theory falsification: it narrows the range of possible explanations by eliminating some ideas. variables, or patterns from consideration

7. What is the difference between the method of agreement and the method of difference? Can it researcher use both together? Explain why or why not. (p. 471-473) > method of agreement.
 * **The method of agreement**: If two cases of a phenomenon share only one feature, that feature is their cause or their effect. Example: Two persons in different places are asked to wear green spectacles all day. That night they are woken when they show REM sleep and asked what colour they are dreaming in. If both have green dreams then the green glasses is the cause of the colour of their dreams (link to ppt)
 * **The method of difference:** If a case in which a phenomenon occurs and one in which it does not differ by only one feature that feature is the cause or a necessary part of the cause of the phenomenon, or its effect. This is the method used in most experiments where an attempt is made to make two groups as identical as possible, and then to give one of the groups an experimental treatment, and then look to see if the experiment has made the groups different. (link to same ppt)
 * You can use the method of difference alone or in conjunction with the method of agreement. The method of difference is usually stronger and is a "double application" of the
 * First, locate cases that are similar in many respects but differ in a few crucial ways.
 * Next pinpoint features whereby a set of cases is similar with regard to an outcome and causal features, and another set whereby the cases differ on outcomes and causal features.
 * The method of difference reinforces information from positive cases (e.g., cases that have common causal features and outcomes) with negative cases (e.g., cases lacking the outcome and causal features). Thus, you look for cases that have many of the causal features of positive cases but lack a few key features and have a different outcome.
 * another good ppt

8. What are the parts of a domain and how are they used in domain analysis? (p. 470)
 * Cultural domains have three pans: a cover term, included terms, and a semantic relationship.
 * The //cover term// is simply the domain's name.
 * //Included terms// are the subtypes or parts of the domain.
 * A //semantic relationship// tells how the included terms fit logically within the domain.
 * For example, in the domain of a witness in a judicial setting. The cover term is "witness." Two subtypes or included terms are "defense witness" and "expert witness." The semantic relationship is "is a kind of." Thus, an expert witness and a defense witness are kinds of witnesses

9. What are the major features of a narrative? (p. 474)
 * **Narrative analysis** Both a type of historical writing that tells a story and a type of qualitative data analysis that presents a chronologically linked chain of events in which individual or collective social actors have an important role.
 * Despite the diversity of its uses, it narrative shares six core elements:
 * 1) telling a story or tale (i.e.. presenting unfolding events from a point of view)
 * 2) a sense of movement or process ( i.e., a before and after condition)
 * 3) interrelations or connections within a complex, detailed context
 * 4) an involved individual or collectivity that engages in action and makes choices
 * 5) coherence or the whole holds together, and
 * 6) the temporal sequencing of a chain of events

10. Why is it important to look for negative evidence, or things that do not appear in the data, for a full analysis? (p. 478)
 * **Negative case method** A method of qualitative data analysis in which a research focuses on a case that does not conform to theoretical expectations and uses details from that case to refine theory. (To study what is //not// explicit in the data or what //did// not happen.)
 * At first studying what is not there may appear counterintuitive, but an alert observer who is aware of all the clues notices what is missing as well as what is there. When what was expected does not occur, it is important information.
 * Negative evidence takes mans forms
 * Events that do not occur.
 * Events of which the population is unaware.
 * Events the population wants to hide.
 * Overlooked commonplace events.
 * Effects of a researcher 's preconceived notions.
 * Unconscious nonreporting.
 * Conscious nonreporting.

Summary of Analytic Strategies Used in Qualitative Data Analysis
 * Ideal Type (p. 467)
 * Successive Approximation (p.469)
 * Illustrative Method (p. 469)
 * Domain Analysis (p. 470)
 * Analytic Comparison (p. 471)
 * Narrative Analysis (p. 474)
 * Negative Case Method (p. 478)