One of the more difficult skills in data analysis is deciding which statistical models and tests to use in a particular situation. 

The choice of statistical model/test is affected by two things:

  1. The kind of question we are asking.

  2. The nature of data we have:

    • what type of variables: ratio, interval, ordinal or nominal?

    • are the assumptions of a particular model or test satisfied by the data?

The schematic key (below) provides a overview of the statistical models and tests we’ve covered in this book, structured in the form of a key. The different choices in the key are determined by a combination of the type of question being asked, and the nature of the data under consideration.

This data set provides information on the passengers on the fatal maiden voyage of the ocean liner Titanic™, summarized according to economic status (class), sex, age and survival. We are going to do a similar end-to-end modeling exercise using logistic regression to predict whether a passenger survived the sinking of the titanic or not.

PizzaOL's owner has assigned this task to you to help her build final understanding about her customers in more objective manner. However, PizzaOL's owner wants to plan for next year and wants to allocate the budget in an optimal manner. To do so, you will need to perform different data analysis tasks (see sheet Requirements) in order to capture the impact and effect of different factors and variables on some other key indicators.  


We have a data which classified if patients have heart disease or not according to features in it. We will try to use this data to create a model which tries predict if a patient has this disease or not. We will use logistic regression (classification) algorithm.

