Models

Below are the models that ROSE added.

Other models can be found in the original Cornac documentation here.

Recommender(Generic Class)

We add a new method to the generic class Recommender to recommend to multiple users at once. This function is used for Explainer and Experiment_Explainers.

class cornac.models.Recommender(name, trainable=True, verbose=False)[source]

Generic class for a recommender model. All recommendation models should inherit from this class.

Parameters:

name (str, required) – Name of the recommender model.
trainable (boolean, optional, default: True) – When False, the model is not trainable.
verbose (boolean, optional, default: False) – When True, running logs are displayed.

num_users

Number of users in training data.

Type:: int

num_items

Number of items in training data.

Type:: int

total_users

Number of users in training, validation, and test data. In other words, this includes unknown/unseen users.

Type:: int

total_items

Number of items in training, validation, and test data. In other words, this includes unknown/unseen items.

Type:: int

uid_map

Global mapping of user ID-index.

Type:: int

iid_map

Global mapping of item ID-index.

Type:: int

max_rating

Maximum value among the rating observations.

Type:: float

min_rating

Minimum value among the rating observations.

Type:: float

global_mean

Average value over the rating observations.

Type:: float

recommend_to_multiple_users(user_ids, k=-1, remove_seen=False, train_set=None)[source]

Generate top-K item recommendations for the given user lists.

Parameters:

user_ids (array, required) – The original ID list of the users.
k (int, optional, default=-1) – Cut-off length for recommendations, k=-1 will return ranked list of all items.
remove_seen (bool, optional, default: False) – Remove seen/known items during training and validation from output recommendations.
train_set (cornac.data.Dataset, optional, default: None) – Training dataset needs to be provided in order to remove seen items.

Returns:

recommendations – Recommended items in the form of their original IDs.

Return type:

pandas.DataFrame, columns as [user_id, item_id, prediction]

Alternating Least Squares for Implicit Datasets (ALS)

class cornac.models.als.recom_als.ALS(name='ALS', k=10, max_iter=20, lambda_reg=0.02, alpha=1.0, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None)[source]

Alternating Least Squares of Matrix Factorization.

Parameters:

k (int, optional, default: 10) – The dimension of the latent factors.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
lambda_reg (float, optional, default: 0.001) – The lambda value used for regularization.
alpha (float, optional, default: 1.0) – The rate of confidence increase
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).

References

[1] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative Filtering for Implicit Feedback Datasets,” in 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy: IEEE, Dec. 2008, pp. 263-272. doi: 10.1109/ICDM.2008.22.

[2] implicit library: https://pypi.org/project/implicit/

fit(train_set, val_set=None)[source]

Fit the model to observations.

Parameters:

train_set (cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.
val_set (cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).

Returns:

self

Return type:

object

score(user_idx, item_idx=None)[source]

Predict the scores/ratings of a user for an item.

Parameters:

user_idx (int, required) – The index of the user for whom to perform score prediction.
item_idx (int, optional, default: None) – The index of the item for which to perform score prediction. If None, scores for all known items will be returned.

Returns:

res – Relative scores that the user gives to the item or to all known items

Return type:

A scalar or a Numpy array

Explainable Matrix Factorization (EMF)

class cornac.models.emf.recom_emf.EMF(name='EMF', k=10, knn_num=10, knn_threshold=0.0, positive_rating_threshold=0.0, max_iter=20, learning_rate=0.001, lambda_reg=0.01, explain_reg=0.1, use_bias=True, early_stop=False, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None, sim_filter_zeros=True)

Explainable Matrix Factorization.

Parameters:

k (int, optional, default: 10) – The dimension of the latent factors.
knn_num (int, optional, default: 10) – The number of nearest neighbors to be used for the explanation.
knn_threshold (float, optional, default: 0.0) – The threshold for the edge weight matrix between user-item pairs.
positive_rating_threshold (float, optional, default: 0.0) – The threshold for the positive ratings.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
learning_rate (float, optional, default: 0.001) – The learning rate.
lambda_reg (float, optional, default: 0.01) – The lambda value used for regularization.
explain_reg (float, optional, default: 0.1) – The lambda value used for regularization of the explanation.
use_bias (boolean, optional, default: True) – When True, user, item, and global biases are used.
early_stop (boolean, optional, default: False) – When True, delta loss will be checked after each iteration to stop learning earlier.
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors, ‘Bu’: user_biases, ‘Bi’: item_biases}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).
sim_filter_zeros (boolean, optional, default: True) – When True, the similarity matrix will be computed by filtering out the zero ratings. When False, the similarity matrix will be computed by considering the zero ratings.

References

B. Abdollahi and O. Nasraoui, “Explainable Matrix Factorization for Collaborative Filtering,” ACM Press, 2016, pp. 5-6. doi: 10.1145/2872518.2889405.

compute_edge_weight_matrix(): Compute the edge weight matrix between user-item pairs of the model.

fit(train_set, val_set=None)

Fit the model to observations.

Parameters:

train_set (cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.
val_set (cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).

Returns:

self

Return type:

object

score(user_idx, item_idx=None)

Predict the scores/ratings of a user for an item.

Parameters:

user_idx (int, required) – The index of the user for whom to perform score prediction.
item_idx (int, optional, default: None) – The index of the item for which to perform score prediction. If None, scores for all known items will be returned.

Returns:

res – Relative scores that the user gives to the item or to all known items

Return type:

A scalar or a Numpy array

Novel and Explainable Matrix Factorisation (NEMF)

class cornac.models.nemf.recom_nemf.NEMF(name='NEMF', k=10, knn_num=10, knn_threshold=0.0, positive_rating_threshold=0.0, max_iter=20, learning_rate=0.001, lambda_reg=0.01, explain_reg=0.1, novel_reg=0.1, use_bias=True, early_stop=False, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None, distance_metric='euclidean', sim_filter_zeros=True)

Novel and Explainable Matrix Factorisation.

Parameters:

k (int, optional, default: 10) – The dimension of the latent factors.
knn_num (int, optional, default: 10) – The number of nearest neighbors to be used for the explanation.
knn_threshold (float, optional, default: 0.0) – The threshold for the edge weight matrix between user-item pairs.
positive_rating_threshold (float, optional, default: 0.0) – The threshold for the positive ratings.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
learning_rate (float, optional, default: 0.001) – The learning rate.
lambda_reg (float, optional, default: 0.01) – The lambda value used for regularization.
explain_reg (float, optional, default: 0.1) – The lambda value used for regularization of the explanation.
novel_reg (float, optional, default: 0.1) – The delta value used for regularization of the novel matrix.
use_bias (boolean, optional, default: True) – When True, user, item, and global biases are used.
early_stop (boolean, optional, default: False) – When True, delta loss will be checked after each iteration to stop learning earlier.
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors, ‘Bu’: user_biases, ‘Bi’: item_biases}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).
distance_metric (string, optional, default: 'euclidean') – The distance metric used for computing the novel matrix.
sim_filter_zeros (boolean, optional, default: True) – When True, the similarity matrix will be computed by filtering out the zero values. When False, the similarity matrix will be computed by keeping the zero values.

References

[1] L. Coba, P. Symeonidis, and M. Zanker, “Personalised novel and explainable matrix factorisation,” Data & Knowledge Engineering, vol. 122, pp. 142-158, Jul. 2019, doi: 10.1016/j.datak.2019.06.003.

compute_edge_weight_matrix(): Compute the edge weight matrix between user-item pairs of the model.

compute_novel_matrix(): Compute the novel matrix of the model using distance-based model.

fit(train_set, val_set=None)

Fit the model to observations.

Parameters:

train_set (cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.
val_set (cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).

Returns:

self

Return type:

object

score(user_idx, item_idx=None)

Predict the scores/ratings of a user for an item.

Parameters:

user_idx (int, required) – The index of the user for whom to perform score prediction.
item_idx (int, optional, default: None) – The index of the item for which to perform score prediction. If None, scores for all known items will be returned.

Returns:

res – Relative scores that the user gives to the item or to all known items

Return type:

A scalar or a Numpy array

Factoriazation Machine Recommender Algorithm (FMRec)

class cornac.models.fm_py.recom_fm_py.FMRec(name='FMRec', trainable=True, verbose=True, uses_features=True, num_factors=50, num_iter=10, k0=True, k1=True, init_stdev=0.1, validation_size=0.01, power_t=0.5, t0=0.001, task='regression', initial_learning_rate=0.001, learning_rate_schedule='optimal')[source]

Factoriazation machine recommender algorithm

Parameters:

name (str) – recommender name
trainable (bool) – whether model can be trained
verbose (bool) – Whether or not to print current iteration, training error
users_features (bool) – whether features are used
num_factors (int) – The dimensionality of the factorized 2-way interactions
num_iter (int,) – Number of iterations
k0 (bool, optional, default True) – Use bias.
k1 (bool, optional, default True) – Use 1-way interactions (learn feature weights).
init_stdev (double, optional, default 0.1) – Standard deviation for initialization of 2-way factors.
validation_size (double, optional, default 0.01) – Proportion of the training set to use for validation.
power_t (double, optional, default 0.5) – The exponent for inverse scaling learning rate [default 0.5].
t0 (double, optional, default 0.001) – Constant in the denominator for optimal learning rate schedule.
task (str, optional, default 'regression') – regression: Labels are real values. classification: Labels are either positive or negative.
initial_learning_rate (double, optional, default 0.001)
learning_rate_schedule (str, optional, default 'optimal') –
The learning rate:

constant: eta = eta0

optimal: eta = 1.0/(t+t0) [default]

invscaling: eta = eta0 / pow(t, power_t)

class LimeRSDataset(dataset)[source]

Dataset object used for FM and Limers model training

convert_to_feature_long()[source]: Convert features from row to column representation example: [(feature 1, item_idx), (feature 2, item_idx)] -> [item_idx, feature1, feature2]

static convert_to_pyfm_format(df, columns=None)[source]: convert dataframe to sparse matrix format Return: sparse matrix, one hot encoded column names

map_to_df(item='True')[source]: helper function to map item_idx to item_id or user_idx to user_id; returned df has xid.map.keys() as id and xid.map.items() as idx

pick_top_items(count=1, train='True')[source]: pick the top n items based on train/test item frequency

pick_top_users(count=1, train='True')[source]: pick the top n users based on train/test user frequency

set_train_frequency(item='True')[source]: calculate user or item frequency appeared in the training set, frequency is mapped against id

fit(train_set, val_set=None)[source]

Fit the model to observations.

Parameters:

train_set (cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.
val_set (cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).

Returns:

self

Return type:

object

score(user_id, item_id=None)[source]

Predict the scores/ratings of a user for an item.

Parameters:

user_idx (int, required) – The index of the user for whom to perform score prediction.
item_idx (int, optional, default: None) – The index of the item for which to perform score prediction. If None, scores for all known items will be returned.

Returns:

res – Relative scores that the user gives to the item or to all known items

Return type:

A scalar or a Numpy array

score_neighbourhood(neighborhood_df)[source]

make prediction on a list of items for each user. dataframe of [user_idx, item_idx] are passed and predictions are made by chunks. This is to void iterating each row and speed up the explain_instance function in limers

Returns: numpy array of predictions ordred by item_idx