Proofs you probably weren't taught: Why is the error function minimized in logistic regression convex?

We want to prove that the error/objective function of logistic regression :

$J\left(\theta\right) = \sum\limits_{i = 1}^m {{y^i}\left\[ { - \log \left( {{h_\theta }\left( x^i \right)} \right)} \right] + \left( {1 - {y^i}} \right)\left\[ { - \log \left( {1 - {h_\theta }\left( x^i \right)} \right)} \right\]} \;\;\;\; (1)$ $\dpi{120} where \;\; h_\theta \left( x \right) = \frac {1}{1+e^{-\theta^Tx}}$

is convex.

Proof:

Before beginning the proof, i would first like to make you review/recollect a few definitions/rules/facts/results related to convex functions:

Definition of a convex function: A function f(x) is said to be convex if the following inequality holds true:

$\dpi{120} f\left( \alpha x + \left( 1 - \alpha\right) y \right) \leq \alpha f\left( x \right) + \left( 1 - \alpha\right) f\left( y \right)$ $\dpi{120} \forall x,y \in Domain(f) \; and \; \alpha \in \left[ 0, 1 \right]$
First-order condition of convexity: A function f(x) which is differentiable is convex if the following inequality condition holds true:

$f\left(y\right) \; \geq \; f\left(x\right) + \nabla _x^T f\left(x\right) \left(y - x\right) ; \;\;\;\; \forall x,y \in Domain(f)$

Intuitively, this condition says that the tangent/first-order-taylor-series approximation of f(x) is globally an under-estimator.

Second-order condition of convexity: A function f(x) which is twice-differentiable is convex if and only if its hessian matrix (matrix of second-order partial derivatives) is positive semi-definite, i.e.

$\forall z: \;\; {z^T} \; \nabla _x^2 \; f\left( x \right) \; z \ge 0 \;\; where \;\; \nabla _x^2\; f\left( x \right) \; is \; the \; hessian$
Sum/Linear-combination of two or more convex functions is also convex: Let f(x) and g(x) be two convex functions. Then any linear combination of these two functions

$(\lambda_1 f + \lambda_2 g)(x) = \lambda_1 f(x) + \lambda_2 g(x)$

is also a convex function (this can be easily proved using the definition of the convex function).

Now notice that if we can prove that the two functions

$\dpi{120} { - \log \left( {{h_\theta }\left( x^i \right)} \right)} \;\; and \;\; { - \log \left( {1 - {h_\theta }\left( x^i \right)} \right)}$

are convex, then our objective function

$\inline \dpi{120} J\left(\theta\right) = \sum\limits_{i = 1}^m {{y^i}\left\[ { - \log \left( {{h_\theta }\left( x^i \right)} \right)} \right] + \left( {1 - {y^i}} \right)\left\[ { - \log \left( {1 - {h_\theta }\left( x^i \right)} \right)} \right\]})$

must also be convex since any linear combination of two or more convex functions is also convex.

Let us now try to prove that

$\inline \dpi{120} { - \log \left( {{h_\theta }\left( x \right)} \right)} = { - \log \left( {\frac{1}{1+e^{-\theta^Tx}}} \right)} = { \log \left( {1+e^{-\theta^Tx}} \right)}$

is a convex function of theta. In order to do this, we will use the second-order condition of convexity described above. Let us first compute the hessian matrix:

$\begin{array}{lcl} grad: && \\ && \\ \nabla _\theta \; \left[ { - \log \left( {{h_\theta }\left( x \right)} \right)} \right] & = & \nabla _\theta \; \left[ { \log \left( {1+e^{-\theta^Tx}} \right)} \right] \\ & = & \left( {\frac{-e^{-\theta^Tx}}{1 + e^{-\theta^Tx}} }\right)\; x \\ & = & \left( {\frac{1}{1 + e^{-\theta^Tx}} - 1}\right)\; x \\ & = & \left( {h_\theta\left( x \right) - 1}\right)x \end{array}$ $\begin{array}{lcl} hessian: && \\ && \\ \nabla _\theta^2 \; \left[ { - \log \left( {{h_\theta }\left( x \right)} \right)} \right] & = & \nabla _\theta \left( \nabla _\theta \; \left[ { - \log \left( {{h_\theta }\left( x \right)} \right)} \right] \right) \\ & = & \nabla _\theta \left( \left( {h_\theta\left( x \right) - 1}\right)\; x \right) \\ & = & h_\theta\left( x \right) \left( 1 - h_\theta\left( x \right) \right) xx^T \end{array}$

Now below is the proof that this hessian matrix is positive semi-definite:

$\begin{array}{lcl} \forall z: \;\; z^T \nabla_x^2 \left( -\log \left( {{h_\theta }\left( x \right)} \right) \right ) & = & z^T \left[ h_\theta\left( x \right) \left( 1 - h_\theta\left( x \right) \right) xx^T \right] z \\ & = & h_\theta\left( x \right) \left( 1 - h_\theta\left( x \right) \right) \left( x^Tz \right)^2 \geq 0 \;\;\;\; (2) \end{array}$

Let us now try to prove that

$\dpi{120} \begin{array}{lcl} {-\log \left( {1 - {h_\theta }\left( x \right)} \right)} & = & { - \log \left( {1 - \frac{1}{1+e^{-\theta^Tx}}} \right)} \\ & = & { - \log \left( {\frac{e^{-\theta^Tx}}{1+e^{-\theta^Tx}}} \right)} \\ & = & { \theta^Tx + \log \left( {1+e^{-\theta^Tx}} \right)} \end{array}$

is a convex function of theta. In order to do this, we will again use the second-order condition of convexity described above. Let us first compute its hessian matrix:

$\begin{array}{lcl} grad: && \\ && \\ \nabla _\theta \; \left[ { - \log \left( {1 - {h_\theta }\left( x \right)} \right)} \right] & = &\nabla _\theta \; \left[ { \theta^Tx + \log \left( {1+e^{-\theta^Tx}} \right)} \right] \\ & = & x + \nabla _\theta \; \left[ { \log \left( {1+e^{-\theta^Tx}} \right)} \right] \end{array}$ $\begin{array}{lcl} hessian: & & \\ &&\\ \nabla _\theta^2 \; \left[ { - \log \left( {1 - {h_\theta }\left( x \right)} \right)} \right] & = & \nabla _\theta \left( \nabla _\theta \; \left[ { - \log \left( {1 - {h_\theta }\left( x \right)} \right)} \right] \right) \\ & = & \nabla_\theta \left( x + \nabla _\theta \; \left[ { \log \left( {1+e^{-\theta^Tx}} \right)} \right] \right) \\ & = & \nabla _\theta^2 \; \left[ { - \log \left( {{h_\theta }\left( x \right)} \right)} \right] \\ &&(we \; have \; proved \; in \; Eq.~(2) \; above \\ && that \; this \; is \; positive \; semi-definite) \end{array}$

Above, we have proved that both

$\dpi{120} { - \log \left( {{h_\theta }\left( x^i \right)} \right)} \;\; and \;\; { - \log \left( {1 - {h_\theta }\left( x^i \right)} \right)}$

are convex functions. And, the error/objective function of logistic regression

$\inline \dpi{120} J\left(\theta\right) = \sum\limits_{i = 1}^m {{y^i}\left\[ { - \log \left( {{h_\theta }\left( x^i \right)} \right)} \right] + \left( {1 - {y^i}} \right)\left\[ { - \log \left( {1 - {h_\theta }\left( x^i \right)} \right)} \right\]})$

is essentially a linear-combination of several such convex functions. Now, since a linear combination of two or more convex functions is convex, we conclude that the objective function of logistic regression is convex.

Hence proved …

Following the same line of approach/argument it can be easily proven that the objective function of logistic regression is convex even if regularization is used.

15 comments:

UnknownNovember 2, 2015 at 3:42 PM
Just to correct a little mistake: only a positive linear combination of convex functions is guaranteed to be convex again. However, in the logistic regression case y_i are positive, so it works indeed.
UnknownFebruary 13, 2016 at 10:44 AM
This comment has been removed by the author.
UnknownFebruary 13, 2016 at 11:09 AM
I think for equation(2) you lost a z in the left part.
UnknownMarch 9, 2018 at 9:26 AM
Awesome post, thank you.

Agree with above that (2) missing a z on L.H.S.
EffesianMarch 12, 2018 at 5:22 AM
I think in Eq.(2) you should write grad_{\theta} and not w.r.t. x.
This is of course just a typo.
Nice post btw.!
Javier TAugust 13, 2020 at 3:10 PM
Thank you a lot! It has been very useful to me :)
AnonymousMarch 23, 2021 at 8:48 PM
who know paper proves that this loss function is convex
صقر المملكهOctober 3, 2021 at 1:28 PM
شركة تعقيم و تطهير بحي العمامرة
شركة تعقيم بحي الياسمين
تعقيم و تطهير بحي الفجيرة
Mark SphynxNovember 16, 2021 at 2:01 AM
Hello Yorkies From Elvis Yorkshire Terrier - Specializing in Teacup Yorkies
We are small breeders whose goal is to produce healthy, high quality little teacup size Yorkie puppies for sale. In order to have puppies for sale most of the time we have teamed up with a couple of other Yorkie breeders that have the same goals in mine. In fact we have some of their Yorkie breeders and they have some of our breeders.
yorkies for sale, teacup yorkie puppies for sale, Yorkie puppies for sale,
MorganDecember 3, 2021 at 4:02 PM
f you are looking for a new furry friend, british shorthair kitten may be a perfect choice. british shorthair for sale near me have been around since the 1800s and were originally bred in England to hunt rats. These days they are more likely found as pets than on the job, but their hunting instincts still remain strong! british shorthair cat for sale can come in many different colors so no matter what your preference is you will find a British Shorthair that matches it perfectly.
UnknownMarch 11, 2022 at 3:23 PM
Helium miner for sale
Rak hotspot miner V2 for sale
UnknownMarch 14, 2022 at 2:03 PM
Cute Doberman puppies for sale
Doberman puppies near me
RickyMarch 19, 2022 at 2:32 AM
Very nice blog. Keep writing. Turkish evisa is a legalized entry Permit which is connected to the person’s passport that allows one to enter into Turkey for touristic purposes and many other purposes.

Isabella AvaMay 18, 2022 at 5:54 AM

Hello everyone, Indian e visa application is an online travel authorization to travel to the country for tourism / business / medical / conference purposes. You can find out all the information about Indian visas through our website.

AnonymousJune 3, 2022 at 8:46 AM
ümraniye samsung klima servisi
kartal mitsubishi klima servisi
ümraniye mitsubishi klima servisi
beykoz vestel klima servisi
üsküdar vestel klima servisi
ümraniye alarko carrier klima servisi
tuzla beko klima servisi
ataşehir lg klima servisi
çekmeköy alarko carrier klima servisi

Friday, October 21, 2011

Why is the error function minimized in logistic regression convex?

15 comments: