Difference between revisions of "Logistic regression"
Jump to navigation
Jump to search
(3 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
===Ordered LR=== | ===Ordered LR=== | ||
If the dependent variable is categorical and can be ordered in a meaningful way (e.g. ''Grade 1'', ''Grade 2'', ''Grade 3''), ''ordered logistic | If the dependent variable is categorical and can be ordered in a meaningful way (e.g. ''Grade 1'', ''Grade 2'', ''Grade 3''), ''ordered logistic regression'' is used.<ref>Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: [https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/ https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/]. Accessed on: March 24, 2018</ref> | ||
==Convergence== | |||
If an individual predictor variable is ''not'' predictive (e.g. one pathologist never calls a diagnosis) the data has ''complete separation'' and will not converge. | |||
==GNU/Octave example== | ==GNU/Octave example== | ||
Line 74: | Line 77: | ||
xlabel('Study time (hours)'); | xlabel('Study time (hours)'); | ||
ylabel('Probability of passing the exam (-)'); | ylabel('Probability of passing the exam (-)'); | ||
</pre> | |||
==R example== | |||
<pre> | |||
noquote ("---------------------------------------------------------------------------------------------------") | |||
noquote ("A script in R to learn about logistic regression") | |||
noquote ("---------------------------------------------------------------------------------------------------") | |||
# Load library | |||
library(rms) | |||
# Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression | |||
y=c( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ) | |||
x=c( 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ) | |||
# Make dataframe | |||
mydata=data.frame(x,y) | |||
# Create logistic regression model | |||
mylrm=lrm(y~x, mydata) | |||
z=predict(mylrm) | |||
# Definite the logistic function | |||
logit <- function(t) { | |||
sigma=1/(1+exp(-t)) | |||
return(sigma) | |||
} | |||
# Get p values from the z values | |||
logit(z) | |||
ypred=logit(z) | |||
# Get the coefficients from the LR model | |||
options(digits=20) | |||
coef(mylrm) | |||
# Calculating the p values using the equation | |||
beta0=-4.0777133660750397581 | |||
beta1=1.5046454013690906404 | |||
ycalc=1/(1+exp(-(beta0+beta1*x))) | |||
# Check the calculation | |||
noquote("The check...") | |||
ycalc-ypred | |||
# Add ycalc to dataframe | |||
mydata=data.frame(x,y,ypred) | |||
# Plot result | |||
plot(x,ypred, main="Probability of Passing versus Time Studied", ylab="Probability of Passing", xlab="Time Studied in Hours") | |||
# Show results | |||
options(digits=7) | |||
noquote("The model...") | |||
mylrm | |||
noquote("The input values and predicted values...") | |||
mydata | |||
noquote("Done calculation!") | |||
</pre> | |||
===Output=== | |||
<pre> | |||
[1] --------------------------------------------------------------------------------------------------- | |||
[1] A script in R to learn about logistic regression | |||
[1] --------------------------------------------------------------------------------------------------- | |||
[1] The check... | |||
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |||
[1] The model... | |||
Logistic Regression Model | |||
lrm(formula = y ~ x, data = mydata) | |||
Model Likelihood Discrimination Rank Discrim. | |||
Ratio Test Indexes Indexes | |||
Obs 20 LR chi2 11.67 R2 0.589 C 0.895 | |||
0 10 d.f. 1 R2(1,20) 0.413 Dxy 0.790 | |||
1 10 Pr(> chi2) 0.0006 R2(1,15) 0.509 gamma 0.798 | |||
max |deriv| 1e-07 Brier 0.137 tau-a 0.416 | |||
Coef S.E. Wald Z Pr(>|Z|) | |||
Intercept -4.0777 1.7610 -2.32 0.0206 | |||
x 1.5046 0.6287 2.39 0.0167 | |||
[1] The input values and predicted values... | |||
x y ypred | |||
1 0.50 0 0.03471034 | |||
2 0.75 0 0.04977295 | |||
3 1.00 0 0.07089196 | |||
4 1.25 0 0.10002862 | |||
5 1.50 0 0.13934447 | |||
6 1.75 0 0.19083651 | |||
7 1.75 1 0.19083651 | |||
8 2.00 0 0.25570318 | |||
9 2.25 1 0.33353024 | |||
10 2.50 0 0.42162653 | |||
11 2.75 1 0.51501086 | |||
12 3.00 0 0.60735864 | |||
13 3.25 1 0.69261733 | |||
14 3.50 0 0.76648083 | |||
15 4.00 1 0.87444750 | |||
16 4.25 1 0.91027764 | |||
17 4.50 1 0.93662366 | |||
18 4.75 1 0.95561071 | |||
19 5.00 1 0.96909707 | |||
20 5.50 1 0.98519444 | |||
[1] Done calculation! | |||
</pre> | </pre> | ||
Latest revision as of 05:59, 3 March 2024
Logistic regression, abbreviated LR, is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.
Types of logistic regression
Multinomial LR
If the dependent variable is categorical and the categories are mutually exclusive (e.g. benign, suspicious, malignant, insufficient), multinomial logistic regression is used.
Ordered LR
If the dependent variable is categorical and can be ordered in a meaningful way (e.g. Grade 1, Grade 2, Grade 3), ordered logistic regression is used.[1]
Convergence
If an individual predictor variable is not predictive (e.g. one pathologist never calls a diagnosis) the data has complete separation and will not converge.
GNU/Octave example
% ------------------------------------------------------------------------------------------------- % TO RUN THIS FROM WITHIN OCTAVE % run logistic_regression_example.m % % TO RUN FROM THE COMMAND LINE % octave logistic_regression_example.m > tmp.txt % ------------------------------------------------------------------------------------------------- clear all; close all; % Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression y = [ 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ]; x = [ 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ]; y=transpose(y); x=transpose(x); [theta, beta, dev, dl, d2l, gamma] = logistic_regression(y,x,1) % calculate standard error se = sqrt (diag (inv (-d2l))) % create array % [[ alpha se_alpha zvalue_alpha zvalue_alpha^2 P_Wald_alpha ] % [ beta_1 se_beta_1 zvalue_beta_1 zvalue_beta_1^2 P_Wald_beta_1 ] % [ beta_2 se_beta_2 zvalue_beta_2 zvalue_beta_2^2 P_Wald_beta_2 ] % [ ... ... ... ... ... ] % [ beta_n se_beta_n zvalue_beta_n zvalue_beta_n^2 P_Wald_beta_n ]] logit_arr=zeros(size(beta,1)+1,5); logit_arr(1,1)=theta; logit_arr(2:end,1)=beta; logit_arr(1:end,2)=se; logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2 logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/ % calculate the probability for a range of values studytime_arr= [ 1, 2, 3, 4, 5] len_studytime_arr=size(studytime_arr,2); % As per the manual ( https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html ) GNU/Octave function fits: % logit (gamma_i (x)) = theta_i - beta' * x, i = 1 ... k-1 for ctr=1:len_studytime_arr P_logit(ctr)=1/(1+exp(theta-studytime_arr(ctr)*beta)); end % print to screen output P_logit logit_arr % create plot len_x = size(x,1) for ctr=1:len_x P_fitted(ctr)=1/(1+exp(theta-x(ctr)*beta)); end plot(x,y,'o') hold on; grid on; plot(x,P_fitted,'-') xlabel('Study time (hours)'); ylabel('Probability of passing the exam (-)');
R example
noquote ("---------------------------------------------------------------------------------------------------") noquote ("A script in R to learn about logistic regression") noquote ("---------------------------------------------------------------------------------------------------") # Load library library(rms) # Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression y=c( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ) x=c( 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ) # Make dataframe mydata=data.frame(x,y) # Create logistic regression model mylrm=lrm(y~x, mydata) z=predict(mylrm) # Definite the logistic function logit <- function(t) { sigma=1/(1+exp(-t)) return(sigma) } # Get p values from the z values logit(z) ypred=logit(z) # Get the coefficients from the LR model options(digits=20) coef(mylrm) # Calculating the p values using the equation beta0=-4.0777133660750397581 beta1=1.5046454013690906404 ycalc=1/(1+exp(-(beta0+beta1*x))) # Check the calculation noquote("The check...") ycalc-ypred # Add ycalc to dataframe mydata=data.frame(x,y,ypred) # Plot result plot(x,ypred, main="Probability of Passing versus Time Studied", ylab="Probability of Passing", xlab="Time Studied in Hours") # Show results options(digits=7) noquote("The model...") mylrm noquote("The input values and predicted values...") mydata noquote("Done calculation!")
Output
[1] --------------------------------------------------------------------------------------------------- [1] A script in R to learn about logistic regression [1] --------------------------------------------------------------------------------------------------- [1] The check... [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [1] The model... Logistic Regression Model lrm(formula = y ~ x, data = mydata) Model Likelihood Discrimination Rank Discrim. Ratio Test Indexes Indexes Obs 20 LR chi2 11.67 R2 0.589 C 0.895 0 10 d.f. 1 R2(1,20) 0.413 Dxy 0.790 1 10 Pr(> chi2) 0.0006 R2(1,15) 0.509 gamma 0.798 max |deriv| 1e-07 Brier 0.137 tau-a 0.416 Coef S.E. Wald Z Pr(>|Z|) Intercept -4.0777 1.7610 -2.32 0.0206 x 1.5046 0.6287 2.39 0.0167 [1] The input values and predicted values... x y ypred 1 0.50 0 0.03471034 2 0.75 0 0.04977295 3 1.00 0 0.07089196 4 1.25 0 0.10002862 5 1.50 0 0.13934447 6 1.75 0 0.19083651 7 1.75 1 0.19083651 8 2.00 0 0.25570318 9 2.25 1 0.33353024 10 2.50 0 0.42162653 11 2.75 1 0.51501086 12 3.00 0 0.60735864 13 3.25 1 0.69261733 14 3.50 0 0.76648083 15 4.00 1 0.87444750 16 4.25 1 0.91027764 17 4.50 1 0.93662366 18 4.75 1 0.95561071 19 5.00 1 0.96909707 20 5.50 1 0.98519444 [1] Done calculation!
See also
References
- ↑ Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/. Accessed on: March 24, 2018