PhD Defense by Srinivas Eswar

Event Details
  • Date/Time:
    • Friday July 1, 2022
      2:00 pm - 4:00 pm
  • Location: Coda C1215 Midtown
  • Phone:
  • URL: Zoom
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Scalable Data Mining via Constrained Low Rank Approximation

Full Summary: No summary paragraph submitted.

Title: Scalable Data Mining via Constrained Low Rank Approximation

Date: Friday, July 1st, 2022

Time: 2pm - 4pm ET

Physical Location: Coda C1215 Midtown

Virtual Location: https://gatech.zoom.us/j/92347767822

 

Srinivas Eswar

School of Computational Science and Engineering

Georgia Institute of Technology

 

Committee:

Dr. Richard Vuduc (Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Haesun Park (Co-Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Ümit V. Çatalyürek (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Edmond Chow (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Grey Ballard (Department of Computer Science, Wake Forest University)

 

------------------------ 

 

Abstract:

Matrix and tensor approximation methods are recognised as foundational tools for modern data analytics. Their strength lies in their long history of rigourous and principled theoretical foundations, judicious formulations via various constraints, along with the availability of fast computer programs. Multiple constrained low rank approximation (CLRA) formulations exist for various commonly encountered tasks like clustering, dimensionality reduction, anomaly detection, amongst others. The primary challenge in modern data analytics is the sheer volume of data to be analysed, often requiring multiple machines to just hold the dataset in memory. This dissertation presents CLRA as a key enabler of scalable data mining in distributed-memory parallel machines.

 

Nonnegative Matrix Factorisation (NMF) is the primary CLRA method studied in this dissertation. NMF imposes nonnegativity constraints on the factor matrices and is popular for its interpretability and clustering prowess. The major bottleneck in most NMF algorithms is a distributed matrix-multiplication kernel. We develop the PLANC software package which includes efficient matrix-multiplication and matricised tensor times Khatri-Rao product kernels tailored to the CLRA case. It employs carefully designed parallel algorithms and data distributions to avoid unnecessary computation and communication. With the development of these key kernels, we can extend PLANC to a variety of cases including handling symmetry constraints, second-order methods, and multiple data modalities. We demonstrate the effectiveness of PLANC via scaling studies on the supercomputers at the Oak Ridge Leadership Computing Facility.

 

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Jul 6, 2022 - 2:44pm
  • Last Updated: Jul 6, 2022 - 2:44pm