event
CSE/Seminar Announcement
Primary tabs
The Locally Weighted Bag of Words Framework for Document Representation and Visualization
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.
You are cordially invited to attend a reception that will follow the seminar to chat informally with faculty and students. Refreshments will be provided.
Groups
Status
- Workflow Status:Published
- Created By:Louise Russo
- Created:02/11/2010
- Modified By:Fletcher Moore
- Modified:10/07/2016
Categories
Keywords