PhD Course Basic Statistics for health researchers

Table of Contents


This is the course homepage of the English version of the PhD course Basic statistics for health researchers. The course description, including the learning objectives, is available from the website of the PhD school, here:

Course Description

The webpage will be updated regularly during the course. Check the footnote at the bottom of the page for the date and time of the last update.

Place & Schedule

The course takes place at the Center for Sundhed og Samfund - CSS - Ă˜ster Farimagsgade 5, 1014 Copenhagen. Classes run from 8:00 to 15:00. The room numbers and schedule are given in the following table:

Date Day Room (8:00-15:00) Topics Teachers
17 April 2023 Monday CSS-7.0.40 Overview, data, descriptive statistics, concept of statistical inference, confidence intervals Paul Blanche, Carolin Herrmann
19 April 2023 Wednesday CSS-7.0.40 Hypothesis testing, tests for continuous outcomes, multiple testing Paul Blanche, Carolin Herrmann
24 April 2023 Monday CSS-7.0.40 Univariate linear regression, correlation, regression to the mean Paul Blanche, Zehao Su
26 April 2023 Wednesday CSS-7.0.40 Analysis of Variance (One-way and Two-way ANOVA) Paul Blanche, Zehao Su
3 May 2023 Wednesday CSS-7.0.40 2x2 tables, odds ratio, two sample tests for binary responses Paul Blanche, Zehao Su
8 May 2023 Monday CSS-7.0.40 Logistic regression Paul Blanche, Zehao Su
10 May 2023 Wednesday CSS-7.0.40 Multiple linear regression, confounding, interaction Paul Blanche, Alessandra Meddis
15 May 2023 Monday CSS-7.0.40 Repeated measurements Brice Ozenne, Alessandra Meddis
17 May 2023 Wednesday CSS-7.0.40 Survival analysis Paul Blanche, Alessandra Meddis
24 May 2023 Wednesday CSS-7.0.40, CSS-7.0.06 Presentation and discussion of homework assignments Paul Blanche, Brice Ozenne

Lectures and R-demo

Lecture notes and accompanying R-demos should be available via the links in the table below no later than two days before the course day. Some R-demos use external data not already available in R packages. Those are also provided via the table below.

Course Day Handout (1x1) Handout (2x2) R-demo External Data
1 Lecture-1 Lecture-1-2x2 Rdemo-1 none
2 Lecture-2 Lecture-2-2x2 Rdemo-2 none
3 Lecture-3 Lecture-3-2x2 Rdemo-3 th , ckd
4 Lecture-4 Lecture-4-2x2 Rdemo-4 none
5 Lecture-5 Lecture-5-2x2 Rdemo-5 dalteparin, SCD
6 Lecture-6 Lecture-6-2x2 Rdemo-6 Framingham
7 Lecture-7 Lecture-7-2x2 Rdemo-7 VitaminD
8 Lecture-8 Lecture-8-2x2 Rdemo-8 none
9 Lecture-9 Lecture-9-2x2 Rdemo-9 subfertile, carcinoma, HFactionHosp

The course is not based on a single textbook, but the following book is an excellent reference. It covers most of the topics of the course (often in more details) and has been written for a similar audience to that of the course. Regression with linear predictors, by Per Kragh Andersen and Lene Theil Skovgaard (Springer, 2010).

Additional reading and video material

To complement the lectures and practicals, for each course day we recommend short videos and papers, to watch/read preferably before or after each course day. This material is independent of the lectures and practicals. It has been developed by other teachers than those of the course.

We recommend this material because we find it both entertaining and educative.

When you watch/read before the course day as recommended, it should help you to better follow the lecture. When you watch/read after the course day, it should help you to revisit some important concepts and/or further learn on a few selected topics.

Course Day What to read/watch ? When to read/watch?
1 StatQuest: The Normal Distribution, Clearly Explained!!! (5 mins) Preferably Before
  Statistics Notes: Standard deviations and standard errors (10 mins) Preferably Before
2 Statistics Notes: Absence of evidence is not evidence of absence (10 mins) Preferably Before
  StatQuest: p-hacking and power calculations (19 mins) Before or After
3 Statistics Notes: Logarithms (10 mins) Preferably Before
  StatQuest: R-squared, Clearly Explained!!! (11 mins) Preferably After
4 Re-read Solution2, especially Exercise A, Question 3, to prepare for Exercise-4 Preferably Before
5 Statistics Notes: What is a percentage difference? (10 mins) Preferably Before
  Statistics Notes: The odds ratio (10 mins) Preferably Before
6 Statistics Notes: Interaction 1: heterogeneity of effects (10 mins) Preferably Before
  Statistics Notes: Interaction 2: compare effect sizes not P values (12 mins) Preferably Before
7 Statistics Notes: Interaction revisited: the difference between two estimates (12 mins) Preferably Before
  Causal analyses of existing databases: no power calculations required (20 mins) Before or After
8 Statistics Notes: Analysing controlled trials with baseline and follow up measurements (13 mins) Preferably Before
  Statistics Notes: Comparisons within randomised groups can be very misleading (10 mins) Preferably Before
9 Statistics Notes: Time to event (survival) data (10 mins) Preferably Before
  Statistics Notes: Survival probabilities (the Kaplan-Meier method) (15 mins) Preferably Before


Practical experience with data analysis and statistical methods will be learned with an emphasis on understanding of the output of the statistical software and the interpretation of the results. Solutions for the exercises are provided only for the statistical software R.

Course Day Exercise External Data R code solution Full Solution (R code & answers)
1 Exercise-1 SCD Solution1.R Solution1.pdf
2 Exercise-2 none Solution2.R Solution2.pdf
3 Exercise-3 SCD Solution3.R Solution3.pdf
4 Exercise-4 SCD Solution4.R Solution4.pdf
5 Exercise-5 smoking , sedative Solution5.R Solution5.pdf
6 Exercise-6 MI Solution6.R Solution6.pdf
7 Exercise-7 BW , Brain Solution7.R Solution7.pdf
8 Exercise-8 none Solution8.R Solution8.pdf
9 Exercise-9 colon2 Solution9.R Solution9.pdf

How to prepare for the course


You will work with your own laptop. You will need Internet during the course. It is your own responsibility to be able to connect.

Statistical software

The focus of this course is not on how to use statistical software. But statistical software is needed for all data analyses and examples that illustrate the statistical methods. The free statistical software R will be used throughout the course, via R-studio ( It is expected that the students learn the syntax and semantics of R before and during the course by themselves. Note that this will often mean a lot of extra hours for preparation and self-training in addition to the actual teaching hours.

All students are expected to start working with R syntax and semantics several weeks before the course starts. A minimum level corresponding to that obtained after completed our online introduction to R at is considered as a prerequisite.

In this introduction, we guide you through how to install R, how to load data, data manipulation and simple calculations and plots. Estimated number of hours to complete the introduction: 10-15 hours depending on your R- and technical skills.

The participants are expected to use their own laptops during the course, to have installed all relevant software and to have downloaded all data for use during the course.

R packages

To run the code of the R-demos as well as to solve the exercises with R, specific packages need to be installed. Below is the list of packages needed for each course day, in addition of those for the previous days. Please install the packages before the start of each course day. The list will be updated no later than two days before the course day.

Some of the packages are essential to use specific functions, others are used to load example data.

Course Day R packages
1 DoseFinding , MESS , timereg , HistData
2 nlme , coin
3 none
4 multcomp , sandwich, HSAUR2
5 Publish
6 none
7 none
8 reshape2, LMMstar, ggplot2, mice , nlmeU
9 survival , prodlim, survRM2

How to pass the course

To pass the course you have to

  • attend 80% of all teaching units (we count the signatures).
  • turn in your homework assignment in due time.
  • present the results of your homework on the last course day.

Homework assignment

A homework assignment is handed out after lecture 5. Participants work with their own data or data related to their own research provided by their PhD supervisor. The homework assignment is turned in after lecture 9.

The homework assignment is now available here.

Created: 2023-05-17 Wed 13:53