PhD Defense by Tonya Woods

Event Details
  • Date/Time:
    • Friday May 1, 2015
      10:00 am - 12:00 pm
  • Location: ISyE Groseclose, Room 226A
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Extracting Meaningful Statistics for the Characterization and Classification of Biological, Medical, and Financial Data

Full Summary: No summary paragraph submitted.

Title:  Extracting Meaningful Statistics for the Characterization and Classification of Biological, Medical, and Financial Data

 

Advisor:  Professor Brani Vidakovic

 

Committee members:  Professor Yajun Mei, Professor Kamran Paynabar, Professor Mirjana Milosevic-Brockett (School of Biology), Dr. Scott Nickleach (Equifax INC)

 

Date and time:  Friday, May 1, 2015, 10:00 AM

 

Location:  ISyE Groseclose, Room 226A

 

Abstract:

 

This thesis is focused on extracting meaningful statistics for the characterization and classification of biological, medical, and financial data and contains four chapters.  The first chapter contains theoretical background on scaling and wavelets, which supports the work in chapters two and three.

 

In the second chapter, we outline a methodology for representing sequences of DNA nucleotides as numeric matrices in order to analytically investigate important structural characteristics of DNA.  This methodology involves assigning unit vectors to nucleotides, placing the vectors into columns of a matrix, and accumulating across the rows of this matrix.  Transcribing the DNA in this way allows us to compute the 2-D wavelet transformation and assess regularity characteristics of the sequence via the slope of the wavelet spectra.  In addition to computing a global slope measure for a sequence, we can apply our methodology for overlapping sections of nucleotides to obtain an evolutionary slope.

 

In the third chapter, we describe various ways wavelet-based scaling may be used for cancer diagnostics.  There were nearly half of a million new cases of ovarian, breast, and lung cancer in the United States last year.  Breast and lung cancer have highest prevalence, while ovarian cancer has the lowest survival rate of the three.  Early detection is critical for all of these diseases, but substantial obstacles to early detection exist in each case.  In this work, we use wavelet-based scaling on metabolic data and radiography images in order to produce meaningful features to be used in classifying cases and controls.  Computer-aided detection (CAD) algorithms for detecting lung and breast cancer often focus on select features in an image and make a priori assumptions about the nature of a nodule or a mass.  In contrast, our approach to analyzing breast and lung images captures information contained in the background tissue of images as well as information about specific features and makes no such a priori assumptions.

 

In the fourth chapter, we investigate the value of social media data in building commercial default and activity credit models.  We use random forest modeling, which has been shown in many instances to achieve better predictive accuracy than logistic regression in modeling credit data.  This result is of interest, as some entities are beginning to build credit scores based on this type of publicly available online data alone.  Our work has shown that the addition of social media data does not provide any improvement in model accuracy over the bureau only models.  However, the social media data on its own does have some limited predictive power.

 

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Public
Categories
Other/Miscellaneous
Keywords
defense, graduate students, graduate students. defense. PhD.
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Apr 8, 2015 - 7:33am
  • Last Updated: Oct 7, 2016 - 10:11pm