PhD Thesis Defense - Tonya Woods

Event Details
  • Date/Time:
    • Friday May 1, 2015
      10:00 am
  • Location: Groseclose 226A
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: PhD Thesis Defense - Tonya Woods

Full Summary: No summary paragraph submitted.

TITLE:  Extracting Meaningful Statistics for the Characterization and Classification of Biological, Medical, and Financial Data

ABSTRACT:

This thesis is focused on extracting meaningful statistics for the characterization and classification of biological, medical, and financial data and contains four chapters. The first chapter contains theoretical background on scaling and wavelets, which supports the work in chapters two and three.

In the second chapter, we outline a methodology for representing sequences of DNA nucleotides as numeric matrices in order to analytically investigate important structural characteristics of DNA. This methodology involves assigning unit vectors to nucleotides, placing the vectors into columns of a matrix, and accumulating across the rows of this matrix. Transcribing the DNA in this way allows us to compute the 2-D wavelet transformation and assess regularity characteristics of the sequence via the slope of the wavelet spectra. In addition to computing a global slope measure for a sequence, we can apply our methodology for overlapping sections of nucleotides to obtain an evolutionary slope.

In the third chapter, we describe various ways wavelet-based scaling may be used for cancer diagnostics. There were nearly half of a million new cases of ovarian, breast, and lung cancer in the United States last year. Breast and lung cancer have highest prevalence, while ovarian cancer has the lowest survival rate of the three. Early detection is critical for all of these diseases, but substantial obstacles to early detection exist in each case. In this work, we use wavelet-based scaling on metabolic data and radiography images in order to produce meaningful features to be used in classifying cases and controls. Computer-aided detection (CAD) algorithms for detecting lung and breast cancer often focus on select features in an image and make a priori assumptions about the nature of a nodule or a mass. In contrast, our approach to analyzing breast and lung images captures information contained in the background tissue of images as well as information about specific featu res and makes no such a priori assumptions.

In the fourth chapter, we investigate the value of social media data in building commercial default and activity credit models. We use random forest modeling, which has been shown in many instances to achieve better predictive accuracy than logistic regression in modeling credit data. This result is of interest, as some entities are beginning to build credit scores based on this type of publicly available online data alone. Our work has shown that the addition of social media data does not provide any improvement in model accuracy over the bureau only models. However, the social media data on its own does have some limited predictive power.

Additional Information

In Campus Calendar
No
Groups

H. Milton Stewart School of Industrial and Systems Engineering (ISYE)

Invited Audience
Undergraduate students, Faculty/Staff, Graduate students
Categories
Other/Miscellaneous
Keywords
No keywords were submitted.
Status
  • Created By: Anita Race
  • Workflow Status: Published
  • Created On: Apr 14, 2015 - 6:00am
  • Last Updated: Apr 13, 2017 - 5:19pm