online read us now
Paper details
Number 2 - June 2022
Volume 32 - 2022
Revisiting strategies for fitting logistic regression for positive and unlabeled data
Adam Wawrzeńczyk, Jan Mielniczuk
Abstract
Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability
in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical
likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact
that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and
a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its
variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by
using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the
label frequency estimation over the best available competitors.
Keywords
positive and unlabeled learning, empirical risk, logistic regression, concave-convex optimization