Datasets
Below is the list of built-in datasets added by ROSE. Other datasets built by Cornac can be found in the Cornac documentation.
Goodreads
This data is built based on the GoodReads dataset.
- cornac.datasets.goodreads.load_feedback(fpath, fmt='UIR', sep=',', skip_lines=0, reader: Reader = None) List[source]
Load the user-item ratings, scale: [1,5]
- Parameters:
fpath (file path to xx-rating.txt)
reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
- Returns:
data – Data in the form of a list of tuples (user, item, rating).
- Return type:
array-like
- cornac.datasets.goodreads.load_sentiment(reader: Reader = None) List[source]
Load the user-item-sentiments The dataset was constructed by the method described in the reference paper.
- Parameters:
reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
- Returns:
data – Data in the form of a list of tuples (user, item, [(aspect, opinion, sentiment), (aspect, opinion, sentiment), …]).
- Return type:
array-like
References
[1] Gao, J., Wang, X., Wang, Y., & Xie, X. (2019). Explainable Recommendation Through Attentive Multi-View Learning. AAAI.
- cornac.datasets.goodreads.prepare_data(data_name='goodreads', test_size=0.2, dense=False, verbose=False, seed=42, item=True, user=False, sample_size=0.1)[source]
Prepare data for the GoodReads dataset. Generate the data split for the dataset.
- Parameters:
data_name (str, default: 'goodreads') –
Name of the dataset to be prepared.
Options: ‘goodreads’, ‘goodreads_uir’, ‘goodreads_uir_1000’, ‘goodreads_limers’
’goodreads’: user-item-rating with sentiment data.
’goodreads_uir’: user-item-rating data in the whole dataset.
’goodreads_uir_1000’: user-item-rating data with 1000 samples.
’goodreads_limers’: user-item-rating data with item genres and user aspects.
test_size (float, default: 0.2) – The proportion of the dataset to include in the test split.
dense (bool, default: False) – If True, use the dense version of the dataset.
verbose (bool, default: False) – If True, print out messages.
seed (int, default: 42) – Random seed.
item (bool, default: True) – If True, include item genres when preparing ‘goodreads_limers’.
user (bool, default: False) – If True, include user aspects when preparing ‘goodreads_limers’.
sample_size (float, default: 0.1) – The proportion of the dataset to include in the split.
- Returns:
rs – The data split.
- Return type:
obj:cornac.eval_methods.RatioSplit