Possible data structure problems

Question

Possible data structure problems

Closed this issue 4 months ago · 12 comments

Dear Professor：
I have been using the msm package to analyze multi-state models before, but msm doesn't work very well with more paths, so I want to use the mstate package for this. I encountered a problem when I used it, I followed the online tutorial to process the data format and then ran the code, it prompted an error as "Error in msprep(time = c("Tstart", "Tstop"), status = "status", data = data307, : unequal dimensions of "time" and "status" data), : "time" and "status" data", : "time" and "status" data", : "time" and "status" data".
unequal dimensions of "time" and "status" data". I don't particularly understand why, as I'm following online tutorials for my data formatting, but there must be a problem somewhere, so I'd like to trouble you to review it for me. Thank you very much for your help, I have attached my data table and code below.
data307.csv
library(mstate)
library(survival)
tmat <- transMat(list(
c(2, 3, 4, 9),
c(5, 6, 9),
c(5, 7, 9),
c(6, 7, 9),
c(8, 9),
c(8, 9),
c(8, 9),
c(9),
c()
), names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))
print(tmat)

mstate_data <- msprep(
time = c("Tstart", "Tstop"),
status = "status",
data = data307,
trans = tmat,
id = "eid",
keep = c("total", "age", "sex", "ethnicity", "education")
)

Answer 1 · 2024-06-27T08:16:10.000Z

Hi @26pan thanks for reaching out - to help you out I would ideally need a minimal reproducible example. Your data307 looks like something that has already been through something like msprep(), except it is not correct (many patients with from = to).

I would recommend looking at simpler examples from help files (e.g. ?msfit) to see what data should look like before it goes into msprep(), and for your own example I would start first with a much simpler transition matrix to get things working.

Answer 2 · 2024-06-27T08:30:03.000Z

“it is not correct (many patients with ).data307msprep()from = to”，sorry，I don't particularly understand what you mean, do you mean that there is something wrong with the structure of my data.

Answer 3 · 2024-06-27T08:35:03.000Z

The dataset that comes with R is in this format, meaning I need to process the data into this format right?

Answer 4 · 2024-06-27T08:48:38.000Z

This latest data screenshot is the correct format to start working with. You can then feed this into msprep() together with your transition matrix, and follow the remaining steps from the package's tutorial.

Answer 5 · 2024-06-27T09:36:22.000Z

Thank you！

Answer 6 · 2024-07-04T11:28:44.000Z

Hello Professor, I would like to ask you how to use the Mstate package to calculate the HR of a variable for each conversion path. I can only get a total HR using the code in the tutorial, but I need to calculate the HR of each path.

library(mstate)
library(survival)
library(dplyr)

tmat <- mstate::transMat(x = list(c(2, 3, 4, 9),
c(5, 6, 9),
c(5, 7, 9),
c(6, 7, 9),
c(8, 9),
c(8, 9),
c(8, 9),
c(9),
c()),
names = c("state1","state2","state3","state4",
"state5","state6","state7","state8","state9"))
print(tmat)
msebmt <- msprep(data = data501, trans = tmat,
status= c(NA, "state2","state3", "state4", "state5",
"state6","state7","state8","state9"),
time = c(NA, "2date","3date","4date","5date","6date",
"7date","8date","9date"),
keep = c("age","sex","education","ethnicity",
"total","health_pa","health_alcohol",
"health_sleep","health_smoke","health_whr"))
head(msebmt)
events(msebmt)

msdata <- expand.covs(msebmt, c("total", "age", "sex", "ethnicity", "education"))

cox_model <- coxph(Surv(Tstart, Tstop, status) ~ total + strata(trans), data = msebmt ,
control = coxph.control(iter.max = 5000, eps = 1e-09, toler.inf = 1e+20, toler.chol = .Machine$double.eps^0.75))

summary(cox_model)

Answer 7 · 2024-07-04T11:40:37.000Z

msdata <- expand.covs(msebmt, c("total", "age", "sex", "ethnicity", "education"))

By using expand.covs() you have normally made transition-specific covariates, for example age.1, age.2 etc., where the number corresponds to the specific transition you are interested in modelling (this is also in the tutorial). You can use these in the model for example simply as:

cox_model <- coxph(Surv(Tstart, Tstop, status) ~ age.2 + age.3 + strata(trans), data = msebmt )

Answer 8 · 2024-07-04T11:48:46.000Z

I have just tried this code and found that the influence of different paths can be obtained, may I ask whether I used it correctly?

expcovs <- expand.covs(msebmt, "total", append = TRUE)

cox_model <- coxph(Surv(Tstart, Tstop, status) ~
total.1 + total.2 + total.3 + total.4 +
total.5 + total.6 + total.7 + total.8 +
total.9 + total.10 + total.11 + total.12 +
total.13 + total.14 + total.15 + total.16 +
total.17 + total.18 + total.19 + total.20 +
age + education + sex + ethnicity +
strata(trans),
data = expcovs,
method = "breslow")

for (i in 1:20) {
cat("Path", i, ":\n")
coef_value <- cox_model$coef[paste0("total.", i)]
if (!is.na(coef_value)) {
cat("Coefficient:", coef_value, "\n")
cat("exp(Coefficient):", exp(coef_value), "\n")
ci <- confint(cox_model)
if (i <= nrow(ci) && !any(is.na(ci[paste0("total.", i), ]))) {
cat("95% CI: [", exp(ci[paste0("total.", i), 1]), ",", exp(ci[paste0("total.", i), 2]), "]\n\n")
} else {
cat("95% CI: NA\n\n")
}
} else {
cat("Path", i, ":\n")
cat("Coefficient: NA\n")
cat("exp(Coefficient): NA\n")
cat("95% CI: NA\n\n")
}
}

Answer 9 · 2024-07-04T11:50:04.000Z

cox_model <- coxph(Surv(Tstart, Tstop, status) ~
total.1 + total.2 + total.3 + total.4 +
total.5 + total.6 + total.7 + total.8 +
total.9 + total.10 + total.11 + total.12 +
total.13 + total.14 + total.15 + total.16 +
total.17 + total.18 + total.19 + total.20 +
age + education + sex + ethnicity +
strata(trans),
data = expcovs,
method = "breslow")

There is a very big difference between this and the use of the "msm "package, so I haven't had the results before.

Answer 10 · 2024-07-04T11:55:50.000Z

This is a tutorial picture and I would like to ask what is the difference between these words "dissub1.1", "dissub2.1", "dissub1.2", "dissub2.2". Also, tcd.1, tcd.2, tcd.3 Why is there only one number after him?

Answer 11 · 2024-07-04T20:48:17.000Z

I have just tried this code and found that the influence of different paths can be obtained, may I ask whether I used it correctly?

This depends on your research question. If you wanted the transitions-specific hazard ratio for total for every transition, and assume that the effects of age, education, sex and ethnicity are the same across all transitions, then yes that is what you specified.

There is a very big difference between this and the use of the "msm "package, so I haven't had the results before.

Both packages make different assumptions (read e.g. here), and the specification/syntax is different in both packages. As I mentioned earlier in this thread, I would recommend your start with a much simpler model (with fewer states) and try to specify the same model structure with both {msm} and {mstate} to check what you are doing.

This is a tutorial picture and I would like to ask what is the difference between these words "dissub1.1", "dissub2.1", "dissub1.2", "dissub2.2". Also, tcd.1, tcd.2, tcd.3 Why is there only one number after him?

The number before the dot is the factor level (only for categorical variables with more than two levels), and the one after the dot is the transition number.

Answer 12 · 2024-07-10T11:29:57.000Z

(Closing since this is not an issue with the package)