Data

Other modules in data are provided by Cornac, document here.

Sentiment Analysis

class cornac.data.lexicon.SentimentAnalysis(input, sep='\t', usecols=['user_id', 'book_id', 'rating', 'review_text'], min_frequency=1)[source]

Process raw data, like text reviews, to generate lexicons in form of (feature:opinion:+/-1).

Parameters:
  • input (string/dataframe, required) – csv/txt file path. Expected format: the first line in file should be the column names, at least include [‘user_id’, ‘book_id’, ‘rating’, ‘review_text’], which are consistent with the usecols parameter. or a Dataframe with columns’ names specified by usecols

  • sep (string, optional, default ' ') – separator of the file, default is ‘ ‘

  • usecols (list, required) – must specific the column names within the file, order matters, [name of user id, name of item id, name of rating, name of review]

  • min_frequency (int, optional, default 1) – drop users who have less than min_frequency reviews

build_lexicons()[source]

Build the lexicons

Returns:

df – [‘user_id’, ‘item_id’, ‘rating, ‘lexicon’]

Return type:

dataframe

save_to_file(lexicon_path, rating_path)[source]

save the processed data to two files, one for lexicons, one for ratings

Parameters:
  • lexicon_path (string, required) – path to save the lexicons, including [user_id, item_id, lexicons]

  • rating_path (string, required) – path to save the ratings, including [user_id, item_id, rating]