If you run a logistic regression in SAS, you get a table which summarizes association of predicted probabilities and observed Responses. There you can see that, SAS provides %Concordance, %Discordance, %Tied and Pairs. Now, question is that how SAS calculates these numbers.
Let me explain with simple example in R.
Consider data 'admission' with 4 variables. 'Admit' is dependent variable or a variable
that we predict using variables gre, gpa and rank.
#***R CODE FOR DATA CREATION***#
admit=c(0,0,1,0,0,1,0,0,0,1,0,1,0,0,1)
gre =c(636,660,800,640,520,760,487,890,765,345,456,675,666,546,786)
gpa=c(3.61,3.67,4,3.19,2.93,3,2.98,3.4,3.2,1.98,4,5.1,3.3,5.1,4.7)
rank=c(3,3,1,4,4,2,4,4,4,3,3,3,2,2,1)
admission=data.frame(admit,gre ,gpa,rank)
#***R CODE FOR DATA CREATION ENDS***#
Fit a logistic regression model.
#***FITTING LOGISTIC REGRESSION***#
model=glm(admit~., family="binomial", data=admission )
#***FITTING LOGISTIC REGRESSION ENDS***#
STEPS TO CALCULATE %CONCORDANCE AND %DISCORDANCE
1) Predict the dependent variable in dataset 'admission' using model.
2) Create another data with only two columns. One column is observed dependent variable and other is predicted.
3) Divide the newly created data in two datasets such that one dataset contains all observations having value of observed dependent variable 1 (call it as one) and other will contain all observations having value of observed dependent variable 0 (call it as Zero).
4) Compare each predicted value in dataset one with each predicted value in dataset Zero. So you have total n*m pairs of type (x,y) to compare.
n: Number of observations in dataset one
m: Number of observations in dataset Zero
x: Candidate from dataset one
y: Candidate from dataset Zero
5) Pairs in which x is greater than y, are concordant pairs
6) Pairs in which x is less than y, are discordant pairs
7) % Concordance= #(concordant pairs)/Total # pairs
% Discordance = #(Discordant pairs)/Total # pairs
#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE***#
Association=function(ModelName)
{
Con_Dis_Data = cbind(model$y, model$fitted.values)
ones = Con_Dis_Data[Con_Dis_Data[,1] == 1,]
zeros = Con_Dis_Data[Con_Dis_Data[,1] == 0,]
conc=matrix(0, dim(zeros)[1], dim(ones)[1])
disc=matrix(0, dim(zeros)[1], dim(ones)[1])
ties=matrix(0, dim(zeros)[1], dim(ones)[1])
for (j in 1:dim(zeros)[1])
{
for (i in 1:dim(ones)[1])
{
if (ones[i,2]>zeros[j,2])
{conc[j,i]=1}
else if (ones[i,2]<zeros[j,2])
{disc[j,i]=1}
else if (ones[i,2]==zeros[j,2])
{ties[j,i]=1}
}
}
Pairs=dim(zeros)[1]*dim(ones)[1]
PercentConcordance=(sum(conc)/Pairs)*100
PercentDiscordance=(sum(disc)/Pairs)*100
PercentTied=(sum(ties)/Pairs)*100
return(list("Percent Concordance"=PercentConcordance,"Percent Discordance"=PercentDiscordance,"Percent Tied"=PercentTied,"Pairs"=Pairs))
}
#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE ENDS***#
Code to call above function: Association(model)
This will give you
1) Percent Concordance
2) Percent Discordance
3) Percent Tied
4) Pairs
Note:
There is also a relation between %concordance and Area Under ROC Curve.
AUC=%concordant +(0.5 * %tied)
Let me explain with simple example in R.
Consider data 'admission' with 4 variables. 'Admit' is dependent variable or a variable
that we predict using variables gre, gpa and rank.
#***R CODE FOR DATA CREATION***#
admit=c(0,0,1,0,0,1,0,0,0,1,0,1,0,0,1)
gre =c(636,660,800,640,520,760,487,890,765,345,456,675,666,546,786)
gpa=c(3.61,3.67,4,3.19,2.93,3,2.98,3.4,3.2,1.98,4,5.1,3.3,5.1,4.7)
rank=c(3,3,1,4,4,2,4,4,4,3,3,3,2,2,1)
admission=data.frame(admit,gre ,gpa,rank)
#***R CODE FOR DATA CREATION ENDS***#
Fit a logistic regression model.
#***FITTING LOGISTIC REGRESSION***#
model=glm(admit~., family="binomial", data=admission )
#***FITTING LOGISTIC REGRESSION ENDS***#
STEPS TO CALCULATE %CONCORDANCE AND %DISCORDANCE
1) Predict the dependent variable in dataset 'admission' using model.
2) Create another data with only two columns. One column is observed dependent variable and other is predicted.
3) Divide the newly created data in two datasets such that one dataset contains all observations having value of observed dependent variable 1 (call it as one) and other will contain all observations having value of observed dependent variable 0 (call it as Zero).
4) Compare each predicted value in dataset one with each predicted value in dataset Zero. So you have total n*m pairs of type (x,y) to compare.
n: Number of observations in dataset one
m: Number of observations in dataset Zero
x: Candidate from dataset one
y: Candidate from dataset Zero
5) Pairs in which x is greater than y, are concordant pairs
6) Pairs in which x is less than y, are discordant pairs
7) % Concordance= #(concordant pairs)/Total # pairs
% Discordance = #(Discordant pairs)/Total # pairs
#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE***#
Association=function(ModelName)
{
Con_Dis_Data = cbind(model$y, model$fitted.values)
ones = Con_Dis_Data[Con_Dis_Data[,1] == 1,]
zeros = Con_Dis_Data[Con_Dis_Data[,1] == 0,]
conc=matrix(0, dim(zeros)[1], dim(ones)[1])
disc=matrix(0, dim(zeros)[1], dim(ones)[1])
ties=matrix(0, dim(zeros)[1], dim(ones)[1])
for (j in 1:dim(zeros)[1])
{
for (i in 1:dim(ones)[1])
{
if (ones[i,2]>zeros[j,2])
{conc[j,i]=1}
else if (ones[i,2]<zeros[j,2])
{disc[j,i]=1}
else if (ones[i,2]==zeros[j,2])
{ties[j,i]=1}
}
}
Pairs=dim(zeros)[1]*dim(ones)[1]
PercentConcordance=(sum(conc)/Pairs)*100
PercentDiscordance=(sum(disc)/Pairs)*100
PercentTied=(sum(ties)/Pairs)*100
return(list("Percent Concordance"=PercentConcordance,"Percent Discordance"=PercentDiscordance,"Percent Tied"=PercentTied,"Pairs"=Pairs))
}
#***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE ENDS***#
Code to call above function: Association(model)
This will give you
1) Percent Concordance
2) Percent Discordance
3) Percent Tied
4) Pairs
Note:
There is also a relation between %concordance and Area Under ROC Curve.
AUC=%concordant +(0.5 * %tied)
Concordance test is seen as evaluation of existance of positive reason supporting "x is atleast good as y" while discordance test is seen as evaluation of existance of negative reason
ReplyDeleteThere is also a relation between %concordance and Area Under ROC Curve.
ReplyDeleteAUC=%concordant +(0.5 * %tied)
Thanks Vaibhav.. its good info..
ReplyDelete.
Hey I created mylogit1 as an output of glm() function. When I apply this function for my model, I get the following error: Error in cbind(model$y, model$fitted.values) : object 'model' not found
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHello Vaibhav,
ReplyDeleteI am working on a R video project. It is supposed to have R video tutorials. Could I please use your codes in the videos with proper citation? Please let me know. I shall be grateful.
Thanks and regards,
Sayantee
Sayantee, you can use codes in your videos with proper citation
Delete