The term "survival analysis
What we will discuss in this course
Data sets we will be using
GBSG2: time to death of 686 breast cancer patients
data(GBSG2, package = "TH.data")
UnempDur: time to re-employment of 3343 unemployed patients
data(UnempDur, package = "Ecdat")
Pro Tip: to learn about a dataset in R, use the help
function
help(UnempDur, package = "Ecdat")
# Check out the help page for this dataset
help(GBSG2, package = "TH.data")
## starting httpd help server ... done
# Load the data
data(GBSG2, package = "TH.data")
# Look at the summary of the dataset
summary(GBSG2)
## horTh age menostat tsize tgrade
## no :440 Min. :21.00 Pre :290 Min. : 3.00 I : 81
## yes:246 1st Qu.:46.00 Post:396 1st Qu.: 20.00 II :444
## Median :53.00 Median : 25.00 III:161
## Mean :53.05 Mean : 29.33
## 3rd Qu.:61.00 3rd Qu.: 35.00
## Max. :80.00 Max. :120.00
## pnodes progrec estrec time
## Min. : 1.00 Min. : 0.0 Min. : 0.00 Min. : 8.0
## 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 8.00 1st Qu.: 567.8
## Median : 3.00 Median : 32.5 Median : 36.00 Median :1084.0
## Mean : 5.01 Mean : 110.0 Mean : 96.25 Mean :1124.5
## 3rd Qu.: 7.00 3rd Qu.: 131.8 3rd Qu.: 114.00 3rd Qu.:1684.8
## Max. :51.00 Max. :2380.0 Max. :1144.00 Max. :2659.0
## cens
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4359
## 3rd Qu.:1.0000
## Max. :1.0000
If you forget the information on the help page, you can always go back to it later.
Why learn survival methods?
Digging into the GBSG2 dataset 1
In the previous exercise, we learned about the GBSG2
dataset. Let’s dig a bit deeper into it to understand the variables we will use in the following.
The cens
variable contains values that indicate whether or not a person in the study has died. In this exercise, you’ll explore these censored values.
# Count censored and uncensored data
num_cens <- table(GBSG2$cens)
num_cens
##
## 0 1
## 387 299
# Create barplot of censored and uncensored data
barplot(num_cens)
# Use help() to look at cens
help(GBSG2, package = "TH.data")
The convention is that the censoring indicator is 1 if the event of interest happened.
Using the Surv() function for GBSG2
In the video, we learned about the Surv()
function, which generates a Surv
object. Let’s look a little deeper into what a Surv
object actually is. We will use the GBSG2
data again.
The survival
package and GBSG2
data are loaded for you in this exercise.
library(pacman)
p_load(survival)
# Create Surv-Object
sobj <- Surv(GBSG2$time, GBSG2$cens)
# Look at 10 first elements
sobj[1:10]
## [1] 1814 2018 712 1807 772 448 2172+ 2161+ 471 2014+
# Look at summary
summary(sobj)
## time status
## Min. : 8.0 Min. :0.0000
## 1st Qu.: 567.8 1st Qu.:0.0000
## Median :1084.0 Median :0.0000
## Mean :1124.5 Mean :0.4359
## 3rd Qu.:1684.8 3rd Qu.:1.0000
## Max. :2659.0 Max. :1.0000
# Look at structure
str(sobj)
## 'Surv' num [1:686, 1:2] 1814 2018 712 1807 772 448 2172+ 2161+ 471 2014+ ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "time" "status"
## - attr(*, "type")= chr "right"
The Surv
object allows us to specify that time and cens belong together. Notice that the elements of the object have a +
symbol if they are censored observations.
The UnempDur dataset
The UnempDur
dataset contains information on how long people stay unemployed. In this case, the event (finding a job) is something positive. This information is stored in the censor1
variable, which has a value of 1
if an individual was re-employed at a full-time job. The spell
variable indicates the length of time an individual was unemployed in number of two-week intervals.
In this exercise, you’ll explore these censored values and create a Surv
object, just as you did in the previous exercises with the GBSG2
dataset.
# Load the UnempDur data
p_load(Ecdat)
help(UnempDur, package = "Ecdat")
data(UnempDur, package = "Ecdat")
# Count censored and uncensored data
cens_employ_ft <- table(UnempDur$censor1)
cens_employ_ft
##
## 0 1
## 2270 1073
# Create barplot of censored and uncensored data
barplot(cens_employ_ft)
# Create Surv-Object
sobj <- Surv(UnempDur$spell, UnempDur$censor1)
# Look at 10 first elements
sobj[1:10]
## [1] 5 13 21 3 9+ 11+ 1+ 3 7 5+
Notice again that the elements of the Surv
object use a +
symbol to indicate if they are censored observations. If you are unaware of what the censored observations represent, take a look at the documentation using the help()
function.
Measures used in survival analysis
Survival analysis questions
Kaplan-Meier estimate
First Kaplan-Meier estimate
In this exercise, we will use the same data shown in the video. We will take a look at the survfit()
function and the object it generates. This exercise will help you explore the survfit
object.
The survival
package is loaded for you in this exercise.
# Create time and event data
time <- c(5, 6, 2, 4, 4)
event <- c(1, 0, 0, 1, 1)
# Compute Kaplan-Meier estimate
km <- survfit(Surv(time, event) ~ 1)
km
## Call: survfit(formula = Surv(time, event) ~ 1)
##
## n events median 0.95LCL 0.95UCL
## 5.0 3.0 4.5 4.0 NA
# Take a look at the structure
str(km)
## List of 16
## $ n : int 5
## $ time : num [1:4] 2 4 5 6
## $ n.risk : num [1:4] 5 4 2 1
## $ n.event : num [1:4] 0 2 1 0
## $ n.censor : num [1:4] 1 0 0 1
## $ surv : num [1:4] 1 0.5 0.25 0.25
## $ std.err : num [1:4] 0 0.5 0.866 0.866
## $ cumhaz : num [1:4] 0 0.5 1 1
## $ std.chaz : num [1:4] 0 0.354 0.612 0.612
## $ type : chr "right"
## $ logse : logi TRUE
## $ conf.int : num 0.95
## $ conf.type: chr "log"
## $ lower : num [1:4] 1 0.1877 0.0458 0.0458
## $ upper : num [1:4] 1 1 1 1
## $ call : language survfit(formula = Surv(time, event) ~ 1)
## - attr(*, "class")= chr "survfit"
# Create data.frame
data.frame(time = km$time, n.risk = km$n.risk, n.event = km$n.event,
n.censor = km$n.censor, surv = km$surv)
## time n.risk n.event n.censor surv
## 1 2 5 0 1 1.00
## 2 4 4 2 0 0.50
## 3 5 2 1 0 0.25
## 4 6 1 0 1 0.25
With this exercise, you learned how to extract relevant information from a survfit
object.
The survival function is the same as function, but we must think about censoring. Kaplan-Meier curves allow for this.
Understanding and visualizing Kaplan-Meier curves
The survfit function
survfit(object)
object
is a formula
: Kaplan Meier estimationobject
(see upcoming chapters):
coxph
survreg
Exercise ignoring censoring
You throw a party and at 1 a.m. guests suddenly start dancing. You are curious to analyze how long your guests will dance for and start collecting data. The problem is that you get tired and go to bed after a while.
You obtain the following right censored dancing times data given in dancedat
:
name
is the name of your friend. time
is the right-censored dancing time. obs_end
indicates if you observed the end of your friends dance (1) or if you went to sleep before they stopped dancing (0). You start analyzing the data in the morning, but you are tired and, at first, ignore the fact that you have censored observations. Then you remember this course on DataCamp and do it correctly.
The survival
package is loaded for you in this exercise.
p_load(survminer)
# Create dancedat data
dancedat <- data.frame(
name = c("Chris", "Martin", "Conny", "Desi", "Reni", "Phil",
"Flo", "Andrea", "Isaac", "Dayra", "Caspar"),
time = c(20, 2, 14, 22, 3, 7, 4, 15, 25, 17, 12),
obs_end = c(1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0))
# Estimate the survivor function pretending that all censored observations are actual observations.
km_wrong <- survfit(Surv(time) ~ 1, data = dancedat)
# Estimate the survivor function from this dataset via kaplan-meier.
km <- survfit(Surv(time, obs_end) ~ 1, data = dancedat)
# Plot the two and compare
ggsurvplot_combine(list(correct = km, wrong = km_wrong))
## Warning: Problem with `mutate()` input `survtable`.
## i `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## i Input `survtable` is `purrr::map2(...)`.
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.
See how ignoring censoring underestimates your friends’ dancing stamina? The correct analysis (red curve) shows that your friends actually dance longer than the incorrect blue curve suggests.
Estimating and visualizing a survival curve
Let’s take a look at the survival of breast cancer patients.
In this exercise, we work with the GBSG2
dataset again.
The survival
and survminer
packages and the GBSG2
data are loaded for you in this exercise.
# Kaplan-Meier estimate
km <- survfit(Surv(time, cens) ~ 1, data = GBSG2)
# plot of the Kaplan-Meier estimate
ggsurvplot(km)
# add the risk table to plot
ggsurvplot(km, risk.table = TRUE)
# add a line showing the median survival time
ggsurvplot(km, risk.table = TRUE, surv.median.line = "hv")
You are becoming an expert in visualizing survival curves! You can immediately spot the median survival and see how many women are still at risk of dying.
The Weibull model for estimating survival curves
Estimating median survival from a Weibull model
We can now estimate the survival of the breast cancer patients in the GBSG2
data using a Weibull model (function survreg()
). Remember, the Weibull model estimates a smooth survival function instead of a step function, which is what the Kaplan-Meier method estimates.
The predict()
function with type = "quantile"
allows us to compute the quantiles of the distribution function. We will use this to compute the median survival.
The survival
package and the GBSG2
data are loaded for you in this exercise.
# Weibull model
wb <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
# Compute the median survival from the model
predict(wb, type = "quantile", p = 0.5, newdata = data.frame(1))
## 1
## 1693.93
Half the patients live longer than 1693.93 days and half die before.
Survival curve quantiles from a Weibull model
We can now estimate the survival of the breast cancer patients in the GBSG2
data using a Weibull model.
The predict()
function with type = "quantile"
allows us to compute the quantiles of the distribution function. As we learned in this course so far, the survival function is 1 - the distribution function , so we can easily compute the quantiles of the survival function using the predict()
function.
The survival
package and GBSG2
data are loaded for you in this exercise.
# Weibull model
wb <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
# 70 Percent of patients survive beyond time point...
predict(wb, type = "quantile", p = 1 - 0.7, newdata = data.frame(1))
## 1
## 1004.524
70 out of 100 patients survive more than 1004.524 days.
Estimating the survival curve with survreg()
We can now estimate the survival of the breast cancer patients in the GBSG2
data using a Weibull model.
The Weibull distribution has two parameters, which determine the form of the survival curve.
The survival
package and the GBSG2
data are loaded for you in this exercise.
# Weibull model
wb <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
# Retrieve survival curve from model probabilities
surv <- seq(.99, .01, by = -.01)
# Get time for each probability
t <- predict(wb, type = "quantile", p = 1 - surv, newdata = data.frame(1))
# Create data frame with the information
surv_wb <- data.frame(time = t, surv = surv)
# Look at first few lines of the result
head(surv_wb)
## time surv
## 1 60.6560 0.99
## 2 105.0392 0.98
## 3 145.0723 0.97
## 4 182.6430 0.96
## 5 218.5715 0.95
## 6 253.3125 0.94
See how the survival curve goes down with time?
Visualizing the results of Weibull models
Comparing Weibull model and Kaplan-Meier estimate
Let’s plot the survival curve we get from the Weibull model for the GBSG2
data!
The survival
and survminer
packages and the GBSG2
data are loaded for you in this exercise.
# Weibull model
wb <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
# Retrieve survival curve from model
surv <- seq(.99, .01, by = -.01)
# Get time for each probability
t <- predict(wb, type = "quantile", p = 1 - surv, newdata = data.frame(1))
# Create data frame with the information needed for ggsurvplot_df
surv_wb <- data.frame(time = t, surv = surv,
upper = NA, lower = NA, std.err = NA)
# Plot
ggsurvplot_df(fit = surv_wb, surv.geom = geom_line)
See how the Weibull model estimates a smooth survival function?
Why use the Weibull model?
Interpreting coefficients
We have a dataset of lung cancer patients. In this exercise, we want to know if the sex of the patients is associated with their survival time.
The survival
package and the dataset are already loaded for you.
# Look at the data set
str(dat)
'data.frame': 228 obs. of 3 variables:
$ time : num 306 455 1010 210 883 ...
$ status: num 2 2 1 2 2 1 2 2 2 2 ...
$ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 2 2 1 1 ...
# Estimate a Weibull model
wbmod <- survreg(Surv(time, status) ~ sex, data = dat)
coef(wbmod)
(Intercept) sexfemale
5.884162 0.395578
The sexfemale
coefficient is positive which means women tend to survive longer.
# Weibull model
wbmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2)
coef(wbmod)
## (Intercept) horThyes
## 7.6084486 0.3059506
# Retrieve survival curve from model
surv <- seq(.99, .01, by = -.01)
t_yes <- predict(wbmod, type = "quantile", p = 1 - surv,
newdata = data.frame(horTh = "yes"))
# Take a look at survival curve
str(t_yes)
## num [1:99] 76.4 131.4 180.9 227.2 271.4 ...
str()
shows the structure of the survival curve.
Visualizing Weibull models
Steps to produce visualization
data.frame
with survival curve informationComputing a Weibull model and the survival curves
In this exercise we will reproduce the example from the video using the following steps:
In this exercise, we will focus on the first three steps. The next exercise will cover the remaining steps.
The survival
, survminer
, and reshape2
packages and the GBSG2
data are loaded for you in this exercise.
p_load(reshape2)
# Weibull model
wbmod <- survreg(Surv(time, cens) ~ horTh + tsize, data = GBSG2)
# Imaginary patients
newdat <- expand.grid(
horTh = levels(GBSG2$horTh),
tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75)))
newdat
## horTh tsize
## 1 no 20
## 2 yes 20
## 3 no 25
## 4 yes 25
## 5 no 35
## 6 yes 35
# Compute survival curves
surv <- seq(.99, .01, by = -.01)
t <- predict(wbmod, type = "quantile", p = 1 - surv,
newdata = newdat)
# How many rows and columns does t have?
dim(t)
## [1] 6 99
Each row of t
corresponds to one covariate combination (one imaginary patient) and each column to one value of surv
.
We will focus now on the last two steps.
The survival
, survminer
, and reshape2
packages and the GBSG2
data are loaded for you in this exercise. The Weibull model wbmod
and the imaginary patient data newdat you already computed are also available.
# Use cbind() to combine the information in newdat with t
surv_wbmod_wide <- cbind(newdat, t)
# Use melt() to bring the data.frame to long format
surv_wbmod <- melt(surv_wbmod_wide, id.vars = c("horTh", "tsize"), variable.name = "surv_id", value.name = "time")
# Use surv_wbmod$surv_id to add the correct survival probabilities surv
surv_wbmod$surv <- surv[as.numeric(surv_wbmod$surv_id)]
# Add columns upper, lower, std.err, and strata to the data.frame
surv_wbmod[, c("upper", "lower", "std.err", "strata")] <- NA
# Take a look at the structure of the object
str(surv_wbmod)
## 'data.frame': 594 obs. of 9 variables:
## $ horTh : Factor w/ 2 levels "no","yes": 1 2 1 2 1 2 1 2 1 2 ...
## $ tsize : num 20 20 25 25 35 35 20 20 25 25 ...
## $ surv_id: Factor w/ 99 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ time : num 65.9 90 62 84.6 54.9 ...
## $ surv : num 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.98 ...
## $ upper : logi NA NA NA NA NA NA ...
## $ lower : logi NA NA NA NA NA NA ...
## $ std.err: logi NA NA NA NA NA NA ...
## $ strata : logi NA NA NA NA NA NA ...
# Plot the survival curves
ggsurvplot_df(surv_wbmod, surv.geom = geom_line,
linetype = "horTh", color = "tsize", legend.title = NULL)
The visualization shows that patients with smaller tumors tend to survive longer and patients who receive hormonal therapy tend to survive longer.
Other distributions than Weibull
dist
is the argument to set the assumed distribution for y variable. You can check which options are possible when looking at the help of survreg.distributions
(?survreg.distributions
).
Computing a Weibull and a log-normal model
In this exercise, we want to compute a Weibull model and a log-normal model for the GBSG2 data. You will see that the process of computing the survival curve is the same. In the upcoming exercise, we will compare the results from the two models and see the differences.
The survival
, survminer
, and reshape2
packages and the GBSG2
data are loaded for you in this exercise.
# Weibull model
wbmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2)
# Log-Normal model
lnmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, dist = "lognormal")
# Newdata
newdat <- data.frame(horTh = levels(GBSG2$horTh))
# Surv
surv <- seq(.99, .01, by = -.01)
# Survival curve from Weibull model and log-normal model
wbt <- predict(wbmod, type = "quantile", p = 1 - surv, newdata = newdat)
lnt <- predict(lnmod, type = "quantile", p = 1 - surv, newdata = newdat)
Comparing Weibull and Log-Normal Model I
In this exercise, we want to add the correct survival probabilities to a data frame. This data frame will be used to plot the survival curves. surv_wide
is a wide data frame containing hormonal therapy information and the survival curves for the Weibull and log-normal models.
The survival
, survminer
, and reshape2
packages and the GBSG2
data are loaded for you in this exercise.
surv_wide <- cbind(wbt, lnt)
# Melt the data.frame into long format.
surv_long <- melt(surv_wide, id.vars = c("horTh", "dist"), variable.name = "surv_id", value.name = "time")
# Add column for the survival probabilities
surv_long$surv <- surv[as.numeric(surv_long$surv_id)]
# Add columns upper, lower, std.err, and strata contianing NA values
surv_long[, c("upper", "lower", "std.err", "strata")] <- NA
Comparing Weibull and Log-Normal Model II
In this exercise, we want to compare the survival curves estimated by a Weibull model and by a log-normal model for the GBSG2 data. This exercise shows how the estimates change if you use a different distribution.
The survival
, survminer
, and reshape2
packages and the GBSG2
data are loaded for you in this exercise.
# Plot the survival curves
ggsurvplot_df(surv_long, surv.geom = geom_line,
linetype = "horTh", color = "dist", legend.title = NULL)
Notice that the survival probability is more drawn out for the log-normal distribution than the Weibull distribution.
The Cox model
Computing a Cox model
We have a dataset of lung cancer patients. We want to know if their performance score (variable performance
) is associated with their survival time. The performance score measures how well a patient can perform usual daily activities (bad=0, good=100).
The survival
package and the dat
dataset are already loaded for you.
# Compute Cox model
cxmod <- coxph(Surv(time, status) ~ performance, data = dat)
# Show model coefficient
coef(cxmod)
performance
-0.01644821
The performance
coefficient is negative which means that higher values of the performance score tend to go along with longer survival (interpretation is contrary to the Weibull model).
Visualizing the Cox model
Steps to visualize a Cox model
data.frame
with survival curve informationComputing the survival curve from a Cox model
In this exercise, we will reproduce the example from the video following the steps:
data.frame
with survival curve informationWe will focus now on the first three steps in this exercise and do the next two steps in the upcoming exercise.
The survival
and survminer
packages and the GBSG2
data are loaded for you in this exercise.
# Cox model
cxmod <- coxph(Surv(time, cens) ~ horTh + tsize, data = GBSG2)
# Imaginary patients
newdat <- expand.grid(
horTh = levels(GBSG2$horTh),
tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75)))
rownames(newdat) <- letters[1:6]
# Inspect newdat
newdat
## horTh tsize
## a no 20
## b yes 20
## c no 25
## d yes 25
## e no 35
## f yes 35
# Compute survival curves
cxsf <- survfit(cxmod, data = GBSG2, newdata = newdat, conf.type = "none")
# Look at first 6 rows of cxsf$surv and time points
head(cxsf$surv)
## a b c d e f
## [1,] 1 1 1 1 1 1
## [2,] 1 1 1 1 1 1
## [3,] 1 1 1 1 1 1
## [4,] 1 1 1 1 1 1
## [5,] 1 1 1 1 1 1
## [6,] 1 1 1 1 1 1
head(cxsf$time)
## [1] 8 15 16 17 18 29
Visualizing a Cox model
In this exercise we will reproduce the example from the video following the steps:
We will focus now on the last two steps in this exercise.
The survival
and survminer
packages and the GBSG2
data are loaded for you in this exercise. The Cox model cxmod
, the imaginary patient data newdat
, and the survival curve information cxsf
from the previous exercise are also available.
# Compute data.frame needed for plotting
surv_cxmod0 <- surv_summary(cxsf)
# Look at the first few lines
head(surv_cxmod0)
## time n.risk n.event n.censor surv std.err upper lower strata
## 1 8 686 0 1 1 0 NA NA a
## 2 15 685 0 1 1 0 NA NA a
## 3 16 684 0 1 1 0 NA NA a
## 4 17 683 0 2 1 0 NA NA a
## 5 18 681 0 1 1 0 NA NA a
## 6 29 680 0 1 1 0 NA NA a
# Get a character vector of patient letters (patient IDs)
pid <- as.character(surv_cxmod0$strata)
# Multiple of the rows in newdat so that it fits with surv_cxmod0
m_newdat <- newdat[pid, ]
# Add patient info to data.frame
surv_cxmod <- cbind(surv_cxmod0, m_newdat)
head(surv_cxmod)
## time n.risk n.event n.censor surv std.err upper lower strata horTh tsize
## a 8 686 0 1 1 0 NA NA a no 20
## a.1 15 685 0 1 1 0 NA NA a no 20
## a.2 16 684 0 1 1 0 NA NA a no 20
## a.3 17 683 0 2 1 0 NA NA a no 20
## a.4 18 681 0 1 1 0 NA NA a no 20
## a.5 29 680 0 1 1 0 NA NA a no 20
# Plot
ggsurvplot_df(surv_cxmod, linetype = "horTh", color = "tsize",
legend.title = NULL, censor = FALSE)
The visualization shows that patients with smaller tumors tend to survive longer and patients who receive hormonal therapy tend to survive longer.
What we’ve learned
Capstone: The Cox model
To conclude the course, let’s take a look back at the lung cancer dataset we utilized briefly in these last 2 chapters. To recap, this dataset contains information on the survival of patients with advanced lung cancer from the North Central Cancer Treatment Group. The event is stored in the status
variable, which has a value of 2
if an individual did not survive. The performance score (variable performance
) measures how well a patient can perform usual daily activities (bad=0, good=100), rated by a physician. We want to know the association between specific performance scores and survival time.
# Compute Cox model and survival curves
cxmod <- coxph(Surv(time, status) ~ performance, data = lung)
new_lung <- data.frame(performance = c(60, 70, 80, 90))
cxsf <- survfit(cxmod, data = lung, newdata = new_lung, conf.type = "none")
# Use the summary of cxsf to take a vector of patient IDs
surv_cxmod0 <- surv_summary(cxsf)
pid <- as.character(surv_cxmod0$strata)
# Duplicate rows in newdat to fit with surv_cxmod0 and add them in
m_newdat <- new_lung[pid, , drop = FALSE]
surv_cxmod <- cbind(surv_cxmod0, m_newdat)
# Plot
ggsurvplot_df(surv_cxmod, color = "performance", legend.title = NULL, censor = FALSE)
Notice that individuals who had a higher performance score had a higher probability of surviving.
Capstone: Comparing survival curves
We saw from the last exercise that performance scores do have an effect on the survival probability. Now, let’s take a look at the survival curve of all individuals using the Kaplan-Meier estimate and compare it to the curve of a Cox model that takesperformance
into account. Note that for Cox models, you can just enter the survfit()
output into ggsurvplot()
instead of creating the needed data frame yourself and plugging it into ggsurvplot_df()
.
# Compute Kaplan-Meier curve
km <- survfit(Surv(time, status) ~ 1, data = lung)
# Compute Cox model
cxmod <- coxph(Surv(time, status) ~ performance, data = lung)
# Compute Cox model survival curves
new_lung <- data.frame(performance = c(60, 70, 80, 90))
cxsf <- survfit(cxmod, data = lung, newdata = new_lung, conf.type = "none")
# Plot Kaplan-Meier curve
ggsurvplot(km, conf.int = FALSE)
# Plot Cox model survival curves
ggsurvplot(cxsf, censor = FALSE)
You’ve finished the last exercise of the course! Move on to the final lesson to find out where you can go from here!
Good bye