<![CDATA[Ph.D. Proposal Oral Exam

682212 event 1746210057 1746210146 <![CDATA[Ph.D. Proposal Oral Exam - Zoe Fowler]]> Title: Combating and Exploiting Non-IID Data in Machine Learning

Committee:

Dr. AlRegib, Advisor

Dr. Calhoun, Chair

Dr. Liu

]]> The objective of the proposed research is to develop techniques to improve machine learning algorithm performance when faced with non-IID data. Non-IID data is abundant in real-life settings and originates from a variety of sources. Despite its abundance, many machine learning algorithms assume that data is distributed in an IID fashion, leading to suboptimal and misleading results. Hence, in this proposal, effective exploitation of non-IID data to enhance performance is first explored by utilizing clinical trial data, as this type of data exhibits dependencies across time. Notably active learning, a subset of machine learning that utilizes query strategies to select the most informative unlabeled samples to an external annotator, mirrors this data collection process and can in turn be leveraged to process such non-IID data. Additional types of non-IID data beyond clinical trials are then explored. In particular, processing data collected from various sources presents another case of non-IID data. Federated learning, a distributed training paradigm where distinct clients collaborate to train an overall global model without sharing data, can be leveraged to process such data. However, federated learning generally suffers from performance deterioration when faced with non-IID data, thereby requiring increased optimization considerations to mitigate performance deterioration. In summary, this proposal discusses the consequences of non-IID data in machine learning, proposing methods to overcome such effects centered around the exploitation of the data itself and the optimization of machine learning algorithms to enhance performance.

]]> <![CDATA[]]> 434371 1788 102851 1808