Gautam Kunapuli's Publications
Book Chapters |
Journal Publications |
Conference Publications |
Workshops |
Dissertation
Book Chapters
G. Kunapuli, J.-S. Pang and K. P. Bennett.
Bilevel Cross-validation-based Model Selection.
Hands-On Pattern Recognition: Challenges in Data
Representation, Model Selection and Performance Prediction.
Isabelle Guyon, Gavin Crawley, Gideon
Dror, Amir Saffari and Nicola Talbot, Editors,
pp. 345–370.
To appear; manuscript in preparation.
K. P. Bennett, G. Kunapuli, J. Hu and J.-S. Pang.
Bilevel Optimization and Machine Learning.
Computational Intelligence: Research Frontiers.
Lecture Notes in Computer Science.
Volume 5050 (2008), pp. 25-47.
IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China,
June 1-6, 2008, Plenary/Invited Lectures.
[pdf, preprint]
[Abstract]
We examine the interplay of optimization and machine learning.
Great progress has been made in machine learning by cleverly
reducing machine learning problems to convex optimization problems
with one or more hyper-parameters. The availability of powerful
convex programming theory and algorithms has enabled a flood of new
research in machine learning models and methods. But many of the
steps necessary for successful machine learning models fall outside
of the convex machine learning paradigm. Thus we now propose framing
machine learning problems as Stackelberg games. The resulting
bilevel optimization problem allows for efficient systematic search
of large numbers of hyper-parameters. We discuss recent progress in
solving these bilevel problems and the many interesting optimization
challenges that remain. Finally, we investigate the intriguing
possibility of novel machine learning models enabled by bilevel
programming.
G. Kunapuli, K. P. Bennett, J. Hu and J.-S. Pang.
Bilevel Model Selection for Support Vector Machines.
CRM Proceedings and Lecture Notes. Volume 45 (2008), pp. 129-158.
American Mathematical Society. Pierre Hansen and Panos Pardolos, Editors.
[pdf, preprint]
[Abstract]
The successful application of Support Vector Machines (SVMs), kernel methods and other
statistical machine learning methods requires selection of model parameters based on estimates of
the generalization error. This paper presents a novel approach to systematic model selection through
bilevel optimization. We show how modelling tasks for widely used machine learning methods can
be formulated as bilevel optimization problems and describe how the approach can address a broad
range of tasks — among which are parameter, feature and kernel selection. In addition, we also discuss
the challenges in implementing these approaches and enumerate opportunities for future work in this
emerging research area.
Journal Publications
G. Kunapuli, K. P. Bennett, J. Hu and J.-S. Pang.
Classification Model Selection via Bilevel Programming.
Optimization Methods and Software. Volume 23, Issue 4 (2008), pp. 475-489,
Special Issue on Mathematical Programming in Data Mining and Machine Learning,
Guest Editors: Katya Scheinberg and Jiming Peng
[pdf, preprint]
[Abstract]
Cross validation is a well-known and widely used method used for model
selection which involves searching a discretized grid for the combination
of model hyper-parameters that minimizes the out-of-sample validation
error — an estimate of the generalization error. This grid-search procedure
effectively limits the size of the parameter space since many convex
optimization problems must be solved at each grid point to evaluate its
effectiveness. This paper proposes a novel formulation of cross-validation
as a bilevel program which can systematically search the continuous hyper-parameter
space. Also discussed are computational methods for solving a bilevel cross-validation
program and numerical results that demonstrate the practicability of this
approach for model selection in machine learning.
J. Hu, J. E. Mitchell, J.-S. Pang, K. P. Bennett and G. Kunapuli.
On the Global Solution of Linear Programs with Linear Complementarity Constraints.
SIAM Journal on Optimization. Volume 19, Number 1 (2008), pp. 445-471.
[pdf, preprint]
[Abstract]
This paper presents a parameter-free integer-programming based algorithm for the global
resolution of a linear program with linear complementarity constraints (LPEC). The cornerstone
of the algorithm is a minimax integer program formulation that characterizes and provides
certificates for the three outcomes---infeasibility, unboundedness, or solvability---of an LPEC.
An extreme point/ray generation scheme in the spirit of Benders decomposition is developed, from
which valid inequalities in the form of satisfiability constraints are obtained. The feasibility
problem of these inequalities and the carefully guided linear programming relaxations of the LPEC
are the workhorse of the algorithm, which also employs a specialized procedure for the sparsification
of the satisfiability cuts. We establish the finite termination of the algorithm and report computational
results using the algorithm for solving randomly generated LPECs of reasonable sizes. The results
establish that the algorithm can handle infeasible, unbounded, and solvable LPECs effectively.
Conference Publications
G. Kunapuli, R. Maclin and J. W. Shavlik.
Advice Refinement in Knowledge-Based Support Vector Machines.
Twenty-Fifth Annual Conference on Neural Information Processing Systems (accepted).
To appear in NIPS Proceedings, December 2011.
[pdf]
[Abstract]
Knowledge-based support vector machines (KBSVMs) incorporate advice
from domain experts, which can improve generalization significantly.
A major limitation that has not been fully addressed occurs when the
expert advice is imperfect, which can lead to poorer models. We
propose a model that extends KBSVMs and is able to not only learn
from data and advice, but also simultaneously improve the advice.
The proposed approach is particularly effective for knowledge
discovery in domains with few labeled examples. The proposed model
contains bilinear constraints, and is solved using two iterative
approaches: successive linear programming and a constrained
concave-convex approach. Experimental results demonstrate that these
algorithms yield useful refinements to expert advice, as well as
improve the performance of the learning algorithm overall.
T. Walker, G. Kunapuli, N. Larsen, D. Page & J. W. Shavlik.
Integrating Knowledge Capture and Supervised Learning Through a Human-Computer Interface.
Proceedings of the Sixth International Conference on Knowledge Capture, Banff, Canada,
June 25-29, 2011.
[pdf]
[Abstract]
Some supervised-learning algorithms can make effective use of domain
knowledge in addition to the input-output pairs commonly used in
machine learning. However, formulating this additional information
often requires an indepth understanding of the specific knowledge
representation used by a given learning algorithm. The requirement to
use a formal knowledge-representation language means that most domain
experts will not be able to articulate their expertise, even when a
learning algorithm is capable of exploiting such valuable information.
We investigate a method to ease this knowledge acquisition through
the use of a graphical, human-computer interface. Our interface
allows users to easily provide advice about specific examples, rather
than requiring them to provide general rules; we leave the task of
properly generalizing such advice to the learning algorithms. We
demonstrate the effectiveness of our approach using the Wargus real-
time strategy game, comparing learning with no advice to learning with
concrete advice provided through our interface, as well as comparing
to using generalized advice written by an AI expert. Our results show
that our approach of combining a GUI-based advice language with an
advice-taking learning algorithm is an effective way to capture
domain knowledge.
G. Kunapuli, K. P. Bennett, R. Maclin and J. W. Shavlik.
The Adviceptron: Giving Advice To The Perceptron.
Twentieth Conference on Artificial Neural Networks In Engineering, November 1-3, 2010.
[pdf]
[Abstract]
We propose a novel approach for incorporating prior knowledge into the
perceptron. The goal is to update the hypothesis taking into account
both label feedback and prior knowledge, in the form of soft polyhedral
advice, so as to make increasingly accurate predictions on subsequent
rounds. Advice helps speed up and bias learning so that good
generalization can be obtained with less data. The updates to the
hypothesis use a hybrid loss that takes into account the margins of
both the hypothesis and advice on the current point.
Analysis of the algorithm via mistake bounds and experimental results
demonstrate that advice can speed up learning
G. Kunapuli, K. P. Bennett, A. Shabbeer, R. Maclin and J. W. Shavlik.
Online Knowledge-Based Support Vector Machines..
Twenty-Third European Conference on Machine Learning, Barcelona, Spain, September 20-24, 2010.
[pdf]
[Abstract]
Prior knowledge, in the form of simple advice rules, can
greatly speed up convergence in learning algorithms. Online learning
methods predict the label of the current point and then receive the correct
label (and learn from that information). The goal of this work is to
update the hypothesis taking into account not just the label feedback,
but also the prior knowledge, in the form of soft polyhedral advice, so
as to make increasingly accurate predictions on subsequent examples.
Advice helps speed up and bias learning so that generalization can be
obtained with less data. Our passive-aggressive approach updates the
hypothesis using a hybrid loss that takes into account the margins of
both the hypothesis and the advice on the current point. Encouraging
computational results and loss bounds are provided.
S. Natarajan, G. Kunapuli, K. Judah, P. Tadepalli, K. Kersting and J. W. Shavlik.
Multi-Agent Inverse Reinforcement Learning.
Ninth International Conference on Machine Learning and Applications, Washington D. C., USA,
December 12-14, 2010.
[pdf]
[Abstract]
Learning the reward function of an agent by observing its behavior
is termed inverse reinforcement learning and
has applications in learning from demonstration or apprenticeship
learning. We introduce the problem of multiagent
inverse reinforcement learning, where reward functions
of multiple agents are learned by observing their uncoordinated
behavior. A centralized controller then learns
to coordinate their behavior by optimizing a weighted sum
of reward functions of all the agents. We evaluate our
approach on a traffic-routing domain, in which a controller
coordinates actions of multiple traffic signals to regulate
traffic density. We show that the learner is not only able
to match but even significantly outperform the expert.
T. Walker, C. Reilly, G. Kunapuli, S. Natarajan, R. Maclin, D. Page and J. W. Shavlik.
Automating the ILP Setup Task: Converting User Advice about Specific Examples into General Background Knowledge.
Twentieth International Conference on Inductive Logic Programming, Firenze, Italy,
June 27-30, 2010.
[pdf]
[Abstract]
Inductive Logic Programming (ILP) provides an effective method
of learning logical theories given a set of positive examples,
a set of negative examples, a corpus of background knowledge
and specification of a search space (e.g., by mode definitions)
from which to compose the theories. While specifying positive and
negative examples is relatively straightforward, composing effective
background knowledge and search space definition requires detailed
understanding of many aspects of the ILP process and limits the
usability of ILP. This paper introduces a number of techniques to
automate the use of ILP for a non-ILP expert. These techniques
include automatic generation of background knowledge from
user-supplied information in the form of a simple relevance language,
utilization of type hierarchies to constrain search, automatic
generation of negative examples, and an iterative-deepening style
search process.
S. Natarajan, P. Tadepalli, G. Kunapuli and J. W. Shavlik.
Learning Parameters for Relational Probabilistic Models with Noisy-Or Combining Rule.
Eighth International Conference on Machine Learning and Applications, Miami Beach, FL, USA
December 13-15 2009.
[pdf]
[Abstract]
Languages that combine predicate logic with
probabilities are needed to succinctly represent knowledge in many real-world domains.
We consider a formalism based on universally
quantified conditional influence statements
that capture local interactions between object attributes. The effects of different
conditional influence statements can be combined
using combining rules such as Noisy-OR. To
combine multiple instantiations of the same
rule we need other combining rules at a lower
level. In this paper we derive and implement algorithms based on gradient-descent
and EM for learning the parameters of these
multi-level combining rules. We compare our
approaches to learning in Markov Logic networks and show superior performance in
multiple domains.
S. Natarajan, G. Kunapuli, C. O' Reilly, R. Maclin, T. Walker, D. Page and J. W. Shavlik.
ILP for Bootstrapped Learning: A Layered Approach to Automating the ILP Setup Problem.
Ninth International Conference on Inductive Logic Programming, Leuven, Belgium, July 2-4, 2009
[pdf]
[Abstract]
This paper introduces a new type of application for ILP called Bootstrapped Learning (BL).
BL brings several challenges to ILP, including the need to automate the “ILP Set-Up”
problem; small numbers of examples, in some cases no negative examples; and the need to “
bootstrap” or to automatically base learning in part on the results of earlier learning
sessions. The paper introduces BL, discusses the general challenges it raises for ILP,
and presents initial results in ILP for BL.
K. P. Bennett, X. Ji, J. Hu, G. Kunapuli and J.-S. Pang.
Model Selection Via Bilevel Programming.
Proceedings of the 2006 IEEE World Congress on Computational Intelligence,
Vancouver, BC, Canada, July 16-21, 2006.
[pdf]
[Abstract]
A key step in many statistical learning methods
used in machine learning involves solving a convex optimization
problem containing one or more hyper-parameters that must
be selected by the users. While cross validation is a commonly
employed and widely accepted method for selecting these
parameters, its implementation by a grid-search procedure in
the parameter space effectively limits the desirable number
of hyper-parameters in a model, due to the combinatorial
explosion of grid points in high dimensions. This paper proposes
a novel bilevel optimization approach to cross validation
that provides a systematic search of the hyper-parameters.
The bilevel approach enables the use of the state-of-the-art
optimization methods and their well-supported softwares. After
introducing the bilevel programming approach, we discuss
computational methods for solving a bilevel cross-validation
program, and present numerical results to substantiate the
viability of this novel approach as a promising computational
tool for model selection in machine learning.
Workshops
G. Kunapuli, R. Maclin and J. W. Shavlik.
Advice Refinement in Knowledge-Based Support Vector Machines..
ICML 2011 Workshop on Combining Learning Strategies to Reduce Label Cost, July 2011.
[pdf]
[poster]
[CLS ICML '11]
S. Natarajan, G. Kunapuli, R. Maclin, D. Page, C. O'Reilly, T. Walker and J. W. Shavlik.
Learning from Human Teachers: Issues and Challenges for ILP in Bootstrap Learning.
AAMAS 2010 Workshop on Agents Learning Interactively from Human Teachers, May 2010.
[pdf]
[ALIHT '10]
S. Natarajan, G. Kunapuli, K. Judah, P. Tadepalli, K. Kersting and J. W. Shavlik.
Multi-Agent Inverse Reinforcement Learning.
The Learning Workshop at Snowbird, UT, April 2010.
S. Natarajan, P. Tadepalli, G. Kunapuli and J. W. Shavlik.
Knowledge Intensive Learning: Directed vs. Undirected SRL Models.
International Workshop in SRL at Leuven, Belgium, July 2009.
[pdf]
[SRL '09]
G. Kunapuli, K. P. Bennett, R. Maclin and J. W. Shavlik.
Online Learning With Knowledge-Based SVMs.
The Learning Workshop at Clearwater Beach, FL, April 2009.
[poster]
[Snowbird '09]
Dissertation
A Bilevel Optimization Approach to Machine Learning.
February 2008.
[pdf]
[Abstract]
A key step in many statistical learning methods used in
machine learning involves solving a convex optimization problem
containing one or more hyper-parameters that must be selected by the
users. While cross validation is a commonly employed and widely
accepted method for selecting these parameters, its implementation
by a grid-search procedure in the parameter space effectively limits
the desirable number of hyper-parameters in a model, due to the
combinatorial explosion of grid points in high dimensions. A novel
paradigm based on bilevel optimization approach is proposed and
gives rise to a unifying framework within which issues such as model
selection can be addressed.
The machine learning problem is formulated as a bilevel
program--a mathematical program that has constraints which are
functions of optimal solutions of another mathematical program
called the inner-level program. The bilevel program is transformed
to an equivalent mathematical program with equilibrium constraints
(MPEC). Two alternative bilevel optimization algorithms are
developed to optimize the MPEC and provide a systematic search of
the hyper-parameters.
In the first approach, the equilibrium constraints of the
MPEC are relaxed to form a nonlinear program with linear objective
and non-convex quadratic inequality constraints, which is then
solved using a general purpose nonlinear programming solver. In the
second approach, the equilibrium constraints are treated as penalty
terms in the objective, and the resulting non-convex quadratic
program with linear constraints is solved using a successive
linearization algorithm.
The flexibility of the bilevel approach to deal with
multiple hyper-parameters, makes it powerful approach to problems
such as parameter and feature selection (model selection). In this
thesis, three problems are studied: model selection for support
vector (SV) classification, model selection for SV regression and
missing value-imputation for SV regression. Extensive computational
results establish that both algorithmic approaches find solutions
that generalize as well or better than conventional approaches and
are much more computationally efficient.