Datasets

Below is the list of built-in datasets added by ROSE. Other datasets built by Cornac can be found in the Cornac documentation.

Goodreads

This data is built based on the GoodReads dataset.

cornac.datasets.goodreads.load_feedback(fpath, fmt='UIR', sep=',', skip_lines=0, reader: Reader = None) → List[source]

Load the user-item ratings, scale: [1,5]

Parameters:

fpath (file path to xx-rating.txt)
reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.goodreads.load_sentiment(reader: Reader = None) → List[source]

Load the user-item-sentiments The dataset was constructed by the method described in the reference paper.

Parameters:: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
Returns:: data – Data in the form of a list of tuples (user, item, [(aspect, opinion, sentiment), (aspect, opinion, sentiment), …]).
Return type:: array-like

References

[1] Gao, J., Wang, X., Wang, Y., & Xie, X. (2019). Explainable Recommendation Through Attentive Multi-View Learning. AAAI.

cornac.datasets.goodreads.prepare_data(data_name='goodreads', test_size=0.2, dense=False, verbose=False, seed=42, item=True, user=False, sample_size=0.1)[source]

Prepare data for the GoodReads dataset. Generate the data split for the dataset.

Parameters:

data_name (str, default: 'goodreads') –
Name of the dataset to be prepared.

Options: ‘goodreads’, ‘goodreads_uir’, ‘goodreads_uir_1000’, ‘goodreads_limers’
- ’goodreads’: user-item-rating with sentiment data.
- ’goodreads_uir’: user-item-rating data in the whole dataset.
- ’goodreads_uir_1000’: user-item-rating data with 1000 samples.
- ’goodreads_limers’: user-item-rating data with item genres and user aspects.
test_size (float, default: 0.2) – The proportion of the dataset to include in the test split.
dense (bool, default: False) – If True, use the dense version of the dataset.
verbose (bool, default: False) – If True, print out messages.
seed (int, default: 42) – Random seed.
item (bool, default: True) – If True, include item genres when preparing ‘goodreads_limers’.
user (bool, default: False) – If True, include user aspects when preparing ‘goodreads_limers’.
sample_size (float, default: 0.1) – The proportion of the dataset to include in the split.

Returns:

rs – The data split.

Return type:

obj:cornac.eval_methods.RatioSplit