Datasets

Below is the list of built-in datasets added by ROSE. Other datasets built by Cornac can be found in the Cornac documentation.

Goodreads

This data is built based on the GoodReads dataset.

cornac.datasets.goodreads.load_feedback(fpath, fmt='UIR', sep=',', skip_lines=0, reader: Reader = None) List[source]

Load the user-item ratings, scale: [1,5]

Parameters:
  • fpath (file path to xx-rating.txt)

  • reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.goodreads.load_sentiment(reader: Reader = None) List[source]

Load the user-item-sentiments The dataset was constructed by the method described in the reference paper.

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, [(aspect, opinion, sentiment), (aspect, opinion, sentiment), …]).

Return type:

array-like

References

[1] Gao, J., Wang, X., Wang, Y., & Xie, X. (2019). Explainable Recommendation Through Attentive Multi-View Learning. AAAI.

cornac.datasets.goodreads.prepare_data(data_name='goodreads', test_size=0.2, dense=False, verbose=False, seed=42, item=True, user=False, sample_size=0.1)[source]

Prepare data for the GoodReads dataset. Generate the data split for the dataset.

Parameters:
  • data_name (str, default: 'goodreads') –

    Name of the dataset to be prepared.

    Options: ‘goodreads’, ‘goodreads_uir’, ‘goodreads_uir_1000’, ‘goodreads_limers’

    • ’goodreads’: user-item-rating with sentiment data.

    • ’goodreads_uir’: user-item-rating data in the whole dataset.

    • ’goodreads_uir_1000’: user-item-rating data with 1000 samples.

    • ’goodreads_limers’: user-item-rating data with item genres and user aspects.

  • test_size (float, default: 0.2) – The proportion of the dataset to include in the test split.

  • dense (bool, default: False) – If True, use the dense version of the dataset.

  • verbose (bool, default: False) – If True, print out messages.

  • seed (int, default: 42) – Random seed.

  • item (bool, default: True) – If True, include item genres when preparing ‘goodreads_limers’.

  • user (bool, default: False) – If True, include user aspects when preparing ‘goodreads_limers’.

  • sample_size (float, default: 0.1) – The proportion of the dataset to include in the split.

Returns:

rs – The data split.

Return type:

obj:cornac.eval_methods.RatioSplit