Monday, 3 December 2012

Concordance and Discordance in Logistic Regression

If you run a logistic regression in SAS, you get a table which summarizes association of predicted probabilities and observed Responses. There you can see that, SAS provides %Concordance, %Discordance, %Tied and Pairs. Now, question is that how SAS calculates these numbers.

Let me explain with simple example in R.
Consider data 'admission' with 4 variables. 'Admit' is dependent variable or a variable
that we predict using variables gre, gpa and rank.

#***R CODE FOR DATA CREATION***#
admit=c(0,0,1,0,0,1,0,0,0,1,0,1,0,0,1)
gre =c(636,660,800,640,520,760,487,890,765,345,456,675,666,546,786)
gpa=c(3.61,3.67,4,3.19,2.93,3,2.98,3.4,3.2,1.98,4,5.1,3.3,5.1,4.7)
rank=c(3,3,1,4,4,2,4,4,4,3,3,3,2,2,1)
admission=data.frame(admit,gre ,gpa,rank)
#***R CODE FOR DATA CREATION ENDS***#

Fit a logistic regression model.

#***FITTING LOGISTIC REGRESSION***#
model=glm(admit~., family="binomial", data=admission )
#***FITTING LOGISTIC REGRESSION ENDS***#

STEPS TO CALCULATE %CONCORDANCE AND %DISCORDANCE

1) Predict the dependent variable in dataset 'admission' using model.
2) Create another data with only two columns. One column is observed dependent variable and other is predicted.
3) Divide the newly created data in two datasets such that one dataset contains all observations   having value  of observed dependent variable 1 (call it as one) and other will contain all observations having value of observed dependent  variable 0 (call it as Zero).
4) Compare each predicted value in dataset one with each predicted value in dataset Zero. So you have total n*m pairs of type (x,y) to compare.
   n: Number of observations in dataset one
   m: Number of observations in dataset Zero
   x: Candidate from dataset one
   y: Candidate from dataset Zero
5) Pairs in which x is greater than y, are concordant pairs
6) Pairs in which x is less than y, are discordant pairs
7) % Concordance= #(concordant pairs)/Total # pairs
    % Discordance = #(Discordant pairs)/Total # pairs

#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE***#
Association=function(ModelName)
{
Con_Dis_Data = cbind(model$y, model$fitted.values)
ones = Con_Dis_Data[Con_Dis_Data[,1] == 1,]
zeros = Con_Dis_Data[Con_Dis_Data[,1] == 0,]
conc=matrix(0, dim(zeros)[1], dim(ones)[1])
disc=matrix(0, dim(zeros)[1], dim(ones)[1])
ties=matrix(0, dim(zeros)[1], dim(ones)[1])
for (j in 1:dim(zeros)[1])
{
for (i in 1:dim(ones)[1])
{
if (ones[i,2]>zeros[j,2])
{conc[j,i]=1}
else if (ones[i,2]<zeros[j,2])
{disc[j,i]=1}
else if (ones[i,2]==zeros[j,2])
{ties[j,i]=1}
}
}
Pairs=dim(zeros)[1]*dim(ones)[1]
PercentConcordance=(sum(conc)/Pairs)*100
PercentDiscordance=(sum(disc)/Pairs)*100
PercentTied=(sum(ties)/Pairs)*100
return(list("Percent Concordance"=PercentConcordance,"Percent Discordance"=PercentDiscordance,"Percent Tied"=PercentTied,"Pairs"=Pairs))
}
#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE ENDS***#

Code to call above function: Association(model)
This will give you 
1) Percent Concordance
2) Percent Discordance
3) Percent Tied
4) Pairs

Note:
There is also a relation between %concordance and Area Under ROC Curve.
AUC=%concordant +(0.5 * %tied)