Data
Other modules in data are provided by Cornac, document here.
Sentiment Analysis
- class cornac.data.lexicon.SentimentAnalysis(input, sep='\t', usecols=['user_id', 'book_id', 'rating', 'review_text'], min_frequency=1)[source]
Process raw data, like text reviews, to generate lexicons in form of (feature:opinion:+/-1).
- Parameters:
input (string/dataframe, required) – csv/txt file path. Expected format: the first line in file should be the column names, at least include [‘user_id’, ‘book_id’, ‘rating’, ‘review_text’], which are consistent with the usecols parameter. or a Dataframe with columns’ names specified by usecols
sep (string, optional, default ' ') – separator of the file, default is ‘ ‘
usecols (list, required) – must specific the column names within the file, order matters, [name of user id, name of item id, name of rating, name of review]
min_frequency (int, optional, default 1) – drop users who have less than min_frequency reviews
- build_lexicons()[source]
Build the lexicons
- Returns:
df – [‘user_id’, ‘item_id’, ‘rating, ‘lexicon’]
- Return type:
dataframe
- save_to_file(lexicon_path, rating_path)[source]
save the processed data to two files, one for lexicons, one for ratings
- Parameters:
lexicon_path (string, required) – path to save the lexicons, including [user_id, item_id, lexicons]
rating_path (string, required) – path to save the ratings, including [user_id, item_id, rating]