## Baseline Characteristics of the Participants

The first participant underwent randomization in July 2013, and the last participant underwent randomization in August 2017 (Fig. S4). The baseline characteristics of the 5047 participants, which were reported previously^{18} and are shown in Table S1, included a mean (±SD) age of 57.2±10.0 years. A total of 63.6% were men, which reflected the inclusion of 10 Veterans Affairs medical centers as trial sites, and 41.5% of the participants were at least 60 years of age. A total of 65.7% of the participants identified as White, 19.8% as Black, and 3.6% as Asian. Ethnic group was also reported by the participants: 18.6% identified as Hispanic or Latinx, 2.7% as American Indian or Alaska Native, and 0.6% as Native Hawaiian or Pacific Islander.

The mean duration of diabetes as reported by the participants was 4.2±2.7 years. The daily metformin dose was 1576±525 mg at initial screening and 1944±205 mg at randomization, and 92.3% of the participants received 2000 mg per day. The mean BMI was 34.3±6.8, and the mean glycated hemoglobin level was 7.5±0.5% (58.3±5.3 mmol per mole). There were no substantial differences in any baseline demographic characteristic or findings on physical examinations or laboratory measurements among the four treatment groups. The baseline characteristics of the recruited cohort resembled those in the U.S. population who had type 2 diabetes that was being treated with metformin, who were of a similar age, and who had a similar duration of diabetes and a similar glycated hemoglobin range (Table S2).

## Participant Retention and Adherence to Trial Visits and Assigned Medications

At the end of the trial in April 2021, the mean duration of follow-up was 5.0 years (range, 0 to 7.6), and 85.8% of the participants had been followed for at least 4 years. Retention and adherence were high; 94% of the participants completed a final visit, and they adhered to a mean of 92% of their expected trial visits (Table 1). A total of 27 of 5047 participants (0.5%) were lost to follow-up, and 153 died during the trial. During the Covid-19 pandemic, which overlapped with the trial closeout period, many visits were conducted by telephone and data on the glycated hemoglobin level were collected with the use of a validated mail-in kit.^{19} As a result, 89% of all expected visits were completed during the final year of the trial (May 1, 2020, through April 30, 2021).

No differences were observed across the four treatment groups with respect to the retention of participants or adherence to trial visits (Table 1). Slight differences were observed with respect to metformin use, with 8% of the participants overall discontinuing metformin during study follow-up. There were differences in adherence to randomly assigned medications, with a higher frequency of discontinuation in the glimepiride and liraglutide groups (23% of the participants in each group) than in the sitagliptin (19%) and glargine (14%) groups. In the liraglutide and sitagliptin groups, most participants received the maximum doses of their assigned treatment; the mean daily maximum doses in the glimepiride and glargine groups were 5.4 mg and 51.4 U, respectively (Table 1 and Table S3). The percentage of participants who received nontrial glucose-lowering medications was highest in the glimepiride group (17%) and the sitagliptin group (15%), with less use of nontrial glucose-lowering medications in the liraglutide (11%) and glargine (14%) groups (Table S4).

## Efficacy

Table 2 shows the numbers of participants in each treatment group with a glycated hemoglobin level of 7.0% or higher (the primary metabolic outcome), a glycated hemoglobin level greater than 7.5% (the secondary metabolic outcome), and the corresponding rates, as well as the pairwise hazard ratios among the treatment groups, hazard ratios for each treatment as compared with the other treatments combined, and the restricted mean survival time (time to event). Details regarding the tertiary-outcome results are provided in Table S5.

Shown are the cumulative incidences of a glycated hemoglobin level of 7.0% or higher (the primary metabolic outcome) (Panel A), a glycated hemoglobin level of greater than 7.5% (the secondary metabolic outcome) (Panel B), and a confirmed glycated hemoglobin level greater than 7.5% after the secondary outcome (the tertiary metabolic outcome) (Panel C), as well as the mean glycated hemoglobin levels (Panel D). In Panels A through C, the shaded bars along the x axis indicate the number of participants with data available for the analyses over time (i.e., the number of participants in whom a specified outcome event had not developed by that time). The vertical dashed lines indicate the results at 4 years, when 85.8% of the participants were undergoing follow-up. To convert values for glycated hemoglobin to millimoles per mole, multiply by 10.93 and then subtract 23.5.

Over the mean 5-year follow-up, 71% of the cohort had a primary metabolic outcome event, with the highest frequency in the sitagliptin group (77%), intermediate frequency in the glimepiride group (72%), and the lowest frequency in the liraglutide (68%) and glargine (67%) groups (Table 2). The between-group differences in the Kaplan–Meier estimates of the cumulative incidence of a primary-outcome event were significant (P<0.001 by the log-rank test) (Figure 1A). During the first year of the trial, 55% of the participants in the sitagliptin group had a primary metabolic outcome event, as compared with fewer than 40% in the other groups. The differences in cumulative incidences over the first 4 years of the trial translated into 697 mean days to a primary metabolic outcome event in the sitagliptin group and 809, 861, and 882 days in the glimepiride, glargine, and liraglutide groups, respectively.

The six pairwise comparisons among the groups (Table 2) showed that the risk of a primary-outcome event was significantly lower with glargine than with sitagliptin (hazard ratio, 0.71, or a 29% risk reduction) and than with glimepiride (0.89, or an 11% risk reduction), with P values of less than or equal to 0.001 and 0.02, respectively, protected with the use of a closed testing procedure. The difference between the glargine and liraglutide groups was not significant. The risk of a primary-outcome event was 41% higher in the sitagliptin group than in the glargine group (hazard ratio in the sitagliptin group as compared with the glargine group, 1.41, which was obtained by inverting the hazard ratio in the glargine group as compared with the sitagliptin group [0.71]), 45% higher than in the liraglutide group, and 26% higher than in the glimepiride group (P≤0.001 for all comparisons) (Table 2). The glimepiride group had a significantly lower risk of a primary-outcome event than the sitagliptin group and a higher risk than the glargine and liraglutide groups.

The rates of secondary-outcome events (Table 2) and tertiary-outcome events followed a pattern that was similar to that for the primary outcome, with lower rates in the glargine and liraglutide groups, an intermediate rate in the glimepiride group, and the highest rate in the sitagliptin group. The Kaplan–Meier analyses of the cumulative incidences of the secondary-outcome events (Figure 1B) and tertiary-outcome events (Figure 1C) also resembled those of the primary-outcome events. A secondary-outcome event occurred in 55% of the participants in the sitagliptin group over a mean follow-up of 5 years, followed by glimepiride (in 50%), liraglutide (in 46%), and glargine (in 39%). The percentage of participants with a tertiary-outcome event increased slowly after the first year, reaching 27% at 4 years (Figure 1C), with small differences among the groups and with the highest risk of a tertiary-outcome event in the glimepiride and sitagliptin groups and slightly lower risks in the glargine and liraglutide groups.

The mean glycated hemoglobin levels reached a nadir at 6 months in the glargine group and at 3 months in the other groups, with subsequent increases thereafter (Figure 1D). At year 4, the absolute differences were small (Fig. S5), with mean glycated hemoglobin levels of 7.1% in the glargine and liraglutide groups as compared with 7.2% in the sitagliptin group and 7.3% in the glimepiride group.

## Subgroup Analyses

The risk reductions, calculated from the hazard ratios, with glargine, glimepiride, and liraglutide, as compared with sitagliptin, are shown with unadjusted 95% confidence intervals. To convert values for glycated hemoglobin to millimoles per mole, multiply by 10.93 and then subtract 23.5.

Prespecified subgroup analyses were performed to identify potential heterogeneity of the effects of the interventions. Table S6 describes the incidences of the primary and secondary metabolic outcome events in the treatment groups within subgroups. The risk of a primary-outcome event appeared to differ (increase) among the groups according to the increasing strata of baseline glycated hemoglobin level (Figure 2). Even among participants in the lowest third of baseline glycated hemoglobin levels (6.8 to 7.2% [50.8 to 55.2 mmol per mole]), a glycated hemoglobin level of less than 7.0% was not achieved or maintained in approximately 60%.

The hazard ratio in the glargine group as compared with the sitagliptin group differed among the strata of glycated hemoglobin levels. The risk reductions with glargine increased among participants from the lower to highest strata of baseline glycated hemoglobin levels, from 17% (95% confidence interval [CI], 3 to 29) to 32% (95% CI, 19 to 42) to 46% (95% CI, 36 to 55). In participants with higher baseline glycated hemoglobin levels, sitagliptin was progressively less effective than the other three medications in maintaining or achieving a glycated hemoglobin level of less than 7.0%. In each glycated hemoglobin stratum, the risk of treatment failure with sitagliptin increased at a rate that was faster than that in the other groups as the glycated hemoglobin level increased. There was no suggestion of heterogeneity among the subgroups with respect to the secondary outcome.

## Sensitivity Analyses

Results based on trial data from before the Covid-19 pandemic period (i.e., before March 15, 2020) showed that Covid-19 had no effect on the trial results as compared with the entire trial period. The results of per-protocol analyses, excluding data collected after the first instance of discontinuation of a trial medication (except for discontinuation in accordance with the protocol after a tertiary-outcome event) or initiation of a nontrial glucose-lowering medication, were similar to those in the intention-to-treat analyses with respect to the primary and secondary metabolic outcomes.

## Serious and Targeted Adverse Events and Effects on Weight

Event rates were calculated as the number of events per 100 participant-years. Simple P values (not corrected for multiple tests among event types or treatment groups) were obtained from a Poisson regression model of the rates. The diagram denotes the range of the P values for the six pairwise group comparisons of each adverse event. The following were adjudicated events: serious adverse events or any targeted events, including severe hypoglycemia, lactic acidosis, pancreatitis, diabetic ketoacidosis or hyperosmolar hyperglycemic syndrome, revascularization (coronary, peripheral, or cerebral), congestive heart failure, or cancer. The first occurrence of weight gain of 10% or more was compared with the baseline weight. Gastrointestinal symptoms included any one of the following symptoms that occurred at least once per week in the 30 days before the quarterly visit and were reported by the participant: nausea, vomiting, bloating or stomach pain, or diarrhea.

Data on serious adverse events and prespecified targeted adverse events are provided in Figure 3. The glimepiride and glargine groups had a significantly higher overall incidence of any adverse event (serious adverse events and prespecified adverse events) (in 38% and 37% of the participants, respectively) than the liraglutide group (in 34%) (uncorrected P=0.001 and P=0.02, respectively). Severe hypoglycemia was generally uncommon, but it affected more participants assigned to glimepiride (2.2%) than to sitagliptin (0.7%) (P≤0.001), liraglutide (1.0%) (P≤0.001), or glargine (1.3%) (P=0.02). There were no substantial differences among the treatment groups with respect to other targeted outcomes, including pancreatitis and pancreatic cancer (data not shown). Gastrointestinal symptoms were considerably more common with liraglutide than with the other three treatments. Over 4 years, the mean loss of weight was substantially higher in the liraglutide and sitagliptin groups (3.5 and 2.0 kg, respectively) than in the glargine and glimepiride groups (0.61 and 0.73 kg, respectively).