event

Machine Learning: Active Learning & Covariance Matrix Estimation

Primary tabs

TITLE: Research in Machine Learning: Active Learning and Covariance Matrix Estimation

SPEAKER: Xinwei Deng

ABSTRACT:

This talk has two parts. The first part is active learning via sequential design with applications to detection of money laundering. Money laundering is a process to conceal the true origin of funds that were originally derived from illegal activities. However, detecting money laundering is not an easy job because of the huge number of transactions that take place each day. The usual approach adopted by financial institutions is to extract some summary statistics from the transaction history and do a thorough and time-consuming investigation on those suspicious accounts. In this work, we propose an active learning via sequential design method for prioritization to improve the process of money laundering detection. The method uses a combination of stochastic approximation and D-optimal designs to judiciously select the accounts for investigation. The sequential nature of the method helps to decide the optimal prioritization criterion with minimal time and effort. A case study with real banking data is used to demonstrate the performance of the proposed method.

The second part is Gaussian covariance matrix estimation with Markov structures. A fact overlooked in covariance matrix estimation is that the random variables are often observed with certain temporal or spatial structures. Effectively accounting for such structures not only results in more accurate estimation but also leads to models that are more interpretable. In this work, we proposed shrinkage estimators of the covariance matrix specifically to address this issue. The proposed methods exploit sparsity in the inverse covariance matrix in a systematic fashion so that the estimate conforms to models of Markov structure and is amenable for subsequent stochastic modeling. The present approach complements the existing work in this direction that deals exclusively with temporal orders and provides a more general and flexible alternative to explore potential Markov properties. We show that the estimation procedure can be formulated as a semi-definite program and efficiently computed. The merits of these methods are illustrated through simulation and the analysis of a real data example.

Status

  • Workflow Status:Published
  • Created By:Anita Race
  • Created:10/12/2009
  • Modified By:Fletcher Moore
  • Modified:10/07/2016