Movie Review Data

2017-08-23  本文已影响37人  磐创AI_聊天机器人

=======

Introduction

This README v2.0 (June, 2004) for the v2.0 polarity dataset comes from
the URL http://www.cs.cornell.edu/people/pabo/movie-review-data .

=======

What's New -- June, 2004

This dataset represents an enhancement of the review corpus v1.0
described in README v1.1: it contains more reviews, and labels were
created with an improved rating-extraction system.

=======

Citation Info

This data was first used in Bo Pang and Lillian Lee,
``A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization
Based on Minimum Cuts'', Proceedings of the ACL, 2004.

@InProceedings{Pang+Lee:04a,
author = {Bo Pang and Lillian Lee},
title = {A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts},
booktitle = "Proceedings of the ACL",
year = 2004
}

=======

Data Format Summary

=======

Rating Decision (Appendix A)

This section describes how we determined whether a review was positive
or negative.

The original html files do not have consistent formats -- a review may
not have the author's rating with it, and when it does, the rating can
appear at different places in the file in different forms. We only
recognize some of the more explicit ratings, which are extracted via a
set of ad-hoc rules. In essence, a file's classification is determined
based on the first rating we were able to identify.

We attempted to recognize half stars, but they are specified in an
especially free way, which makes them difficult to recognize. Hence,
we may lose a half star very occasionally; but this only results in 2.5
stars in five star system being categorized as negative, which is
still reasonable.

部落360:http://www.buluo360.com

上一篇 下一篇

猜你喜欢

热点阅读