event

CSE/Seminar Announcement

Primary tabs

The Locally Weighted Bag of Words Framework for Document Representation and Visualization

The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.

You are cordially invited to attend a reception that will follow the seminar to chat informally with faculty and students. Refreshments will be provided.

Status

  • Workflow Status:Published
  • Created By:Louise Russo
  • Created:02/11/2010
  • Modified By:Fletcher Moore
  • Modified:10/07/2016

Categories

  • No categories were selected.

Keywords

  • No keywords were submitted.