Difference between revisions of "Logistic regression"

From Libre Pathology
Jump to navigation Jump to search
(create)
 
Line 37: Line 37:
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calcated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/


% calculate the probability for a range of values
% calculate the probability for a range of values

Revision as of 02:59, 18 March 2017

Logistic regression is a type of curve fitting. It used for discrete outcome variables, e.g. dead or alive.

GNU/Octave example

% -------------------------------------------------------------------------------------------------
% TO RUN THIS FROM WITHIN OCTAVE
% 	run logistic_regression_example.m
%
% TO RUN FROM THE COMMAND LINE
% 	octave logistic_regression_example.m > tmp.txt
% -------------------------------------------------------------------------------------------------
clear all;
close all;

% Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y = [ 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ];
x = [ 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ];

y=transpose(y);
x=transpose(x);

[theta, beta, dev, dl, d2l, gamma] = logistic_regression(y,x,1)

% calculate standard error
se = sqrt (diag (inv (-d2l)))

% create array  
%   [[ alpha 	se_alpha	zvalue_alpha 	zvalue_alpha^2		P_Wald_alpha	]
%    [ beta_1	se_beta_1	zvalue_beta_1	zvalue_beta_1^2		P_Wald_beta_1	]
%    [ beta_2	se_beta_2	zvalue_beta_2	zvalue_beta_1^2		P_Wald_beta_1	]
%    [ ...	...		...		...			...		]
%    [ beta_n	se_beta_n	zvalue_beta_n	zvalue_beta_1^2		P_Wald_beta_n  	]]
logit_arr=zeros(size(beta,1)+1,5);
logit_arr(1,1)=theta;
logit_arr(2:end,1)=beta;
logit_arr(1:end,2)=se;
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2);	% zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3);	% zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1)		% Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/

% calculate the probability for a range of values
studytime_arr= [ 1, 2, 3, 4, 5]
len_studytime_arr=size(studytime_arr,2);

% As per the manual ( https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html ) GNU/Octave function fits:
%	logit (gamma_i (x)) = theta_i - beta' * x,   i = 1 ... k-1

for ctr=1:len_studytime_arr
  P_logit(ctr)=1/(1+exp(theta-studytime_arr(ctr)*beta));
end

% print to screen output
P_logit
logit_arr

% create plot
len_x = size(x,1)

for ctr=1:len_x
  P_fitted(ctr)=1/(1+exp(theta-x(ctr)*beta));
end

plot(x,y,'o')
hold on;
grid on;
plot(x,P_fitted,'-')
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');

See also

External links