Chapter 1- Setting the Statistical Scene

Introduction
1. Statistics as a set of mathematically based tools and techniques to transform raw (unprocessed) data in to a few summary measures that represent useful and usable information to support effective decision making
2. The statistical analysis in management decision making is performed via a Management decision support system which is illustrated in
The Language of Statistics
1. A number of important terms, concepts and symbols are used extensively in statistics
  1. Random variable:
    1. Any attribute of interest on which data is collected and analysed
  2. Data:
    1. The actual values (numbers) or outcomes recorded on a random variable
  3. Information:
    1. The results of data processing, and is meaningful
  4. Sample:
    1. A fraction or a subset of a population that is selected in order to carry out a survey. In many cases, researchers are compelled to use sample data instead of the full population
  5. Sampling unit:
    1. The object being measured, counted or observed with respect to the random variable under study
  6. Sample statistic:
    1. A measure that describes a characteristic of a sample
  7. Population:
    1. The collection of all possible data values that exist for the random variable under study
  8. Population parameter:
    1. A measure that describes a characteristic of a population
Components of Statistics
1. Descriptive statistics: condenses sample data into a few summary descriptive measures
  1. When large quantities of data have been gathered, there is a need to organise, summarise and extract the essential information contained within this data for communication to management
  2. These summary measures allow a user to identify profiles, patterns, relationships and trends within the data
2. Inferential statistics: generalises sample findings to the broader population
  1. Descriptive statistics only describes the behaviour of a random variable in a sample
    1. However, management is mainly concerned about the behaviour and characteristics of random variables in the population from which the sample was drawn
      1. Inferential statistics is that area of statistics that allows managers to understand the population picture of a random variable based on the sample evidence
3. Statistical modelling: builds models of relationships between random variables
  1. Constructs equations between variables that are related to each other
    1. These equations (called models) are then used to estimate or predict values of one of these variables based on values of related variables
Statistical Applications in Management
1. Finance
  1. At a company level, statistics is used to assess the validity of different investment projects
2. Marketing
  1. analyse consumer behaviour, purchasing patterns, identifying viable market segments, determining media effectiveness
3. Human Resources
  1. training, employee turnover, compensation planning, employee issues
4. Operations/ Logistics
  1. machine utilisation, labour utilisation
Data and Data Quality
1. Data Quality
  1. "Garbage in, garbage out"
  2. Is influenced by four factors: datatype, data source, the method of data collection, data preparation
2. Selection of Statistical Method
  1. Depends on management problem to be addressed and then on the type of data available
Data Types & Measurement Scales
1. Measurement Scales:
  1. The scale determines the extent to which the data can be manipulated and also which statistical methods are appropriate to use on the data to produce valid statistical results
    1. Nominal data
      1. is associated with categorical data. If all the categories of a qualitative random variable are of equal importance, then this categorical data is termed ‘nominal-scaled’
      2. gender (1 = male; 2 = female)
      3. city of residence (1 = PTA; 2 = DBN)
    2. Ordinal data
      1. is also associated with categorical data, but has an implied ranking between the different categories of the qualitative random variable
      2. Each consecutive category possesses either more or less than the previous category of a given characteristic
      3. size of clothing (1 = small; 2 = medium)
      4. product usage level (1 = light; 2 = moderate; 3 = heavy)
    3. Interval data
      1. Interval data is associated with numeric data and quantitative random variables
      2. It is generated mainly from rating scales, which are used in survey questionnaires to measure respondents’ attitudes
    4. Ratio data
      1. Ratio data consists of all real numbers associated with quantitative random variables
      2. employee ages (years), customer income (R), distance travelled (km)
      3. Ratio data has all the properties of numbers (order, distance and an absolute origin of zero) that allow such data to be manipulated using all arithmetic operations
2. Data Types
  1. Qualitative random variables
    1. Generate categorical (non-numeric) response data. The data is represented by categories only
      1. gender of a consumer
      2. an employee’s highest qualification
  2. Quantitative random variables
    1. Generate numeric response data. These are real numbers that can be manipulated using arithmetic operations (add, subtract, multiply and divide)
      1. age of an employee
      2. machine downtime
      3. price of a product in different stores
  3. Discrete data: is whole number (or integer) data
    1. no. of students in a class, no of cars sold
  4. Continuous data: is any number that can occur in an interval
    1. the time needed for an assembly line, the volume of fuel for a car
Data Sources
1. Classification
  1. internal
    1. financial reports; departmental records or reports
  2. external
    1. internet; media
2. Primary data is data that is recorded for the first time at source and with a specific purpose in mind
  1. advantage of primary-sourced data is its high quality
  2. disadvantage of primary-sourced data is that it can be time consuming and expensive to collect
3. Secondary data is data that already exists in a processed format
  1. advantages. First, its access time is relatively short (especially if the data is accessible through the internet), and second it is generally less expensive
  2. disadvantages are that the data may not be problem specific (i.e. problem of its relevancy), it may be out of date
Data Collection Methods
1. Observation
  1. Primary data can be collected by observing a respondent or a process in action
  2. Advantage: the respondent is unaware of being observed and therefore behaves more naturally or spontaneously
  3. Disadvantage: the passive form of data collection. There is no opportunity to probe for reasons or to further investigate underlying causes
2. Surveys
  1. The direct questioning of respondents using questionnaires to structure and record the data collection
  2. Advantages: a higher response rate is generally achieved. it allows probing for reasons. the data is current and generally more accurate
  3. Disadvantages: personal interviews are: time consuming. expensive to conduct
3. Telephone interviews
  1. Advantages: cost is relatively low. questions can be clarified by the interviewer
  2. Disadvantage: loss of respondent anonymity
4. Experimentation
  1. the analyst manipulates certain variables under controlled conditions
  2. Advantage: high quality
  3. Disadvantage costly and time-consuming
5. E-survey
  1. Is geographically dispersed and it is not practical to conduct personal interviews
  2. Advantages: Interviewer bias is eliminated. anonymity of each respondent is assured
  3. Disadvantage: low response rates

Next up

Chapter 1- Setting the Statistical Scene

Description

Resource summary

Media attachments

Similar

	Created by Zach Ryder over 7 years ago