Models
Below are the models that ROSE added.
Other models can be found in the original Cornac documentation here.
Recommender(Generic Class)
We add a new method to the generic class Recommender to recommend to multiple users at once. This function is used for Explainer and Experiment_Explainers.
- class cornac.models.Recommender(name, trainable=True, verbose=False)[source]
Generic class for a recommender model. All recommendation models should inherit from this class.
- Parameters:
name (str, required) – Name of the recommender model.
trainable (boolean, optional, default: True) – When False, the model is not trainable.
verbose (boolean, optional, default: False) – When True, running logs are displayed.
- total_users
Number of users in training, validation, and test data. In other words, this includes unknown/unseen users.
- Type:
- total_items
Number of items in training, validation, and test data. In other words, this includes unknown/unseen items.
- Type:
- recommend_to_multiple_users(user_ids, k=-1, remove_seen=False, train_set=None)[source]
Generate top-K item recommendations for the given user lists.
- Parameters:
user_ids (array, required) – The original ID list of the users.
k (int, optional, default=-1) – Cut-off length for recommendations, k=-1 will return ranked list of all items.
remove_seen (bool, optional, default: False) – Remove seen/known items during training and validation from output recommendations.
train_set (
cornac.data.Dataset, optional, default: None) – Training dataset needs to be provided in order to remove seen items.
- Returns:
recommendations – Recommended items in the form of their original IDs.
- Return type:
pandas.DataFrame, columns as [user_id, item_id, prediction]
Alternating Least Squares for Implicit Datasets (ALS)
- class cornac.models.als.recom_als.ALS(name='ALS', k=10, max_iter=20, lambda_reg=0.02, alpha=1.0, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None)[source]
Alternating Least Squares of Matrix Factorization.
- Parameters:
k (int, optional, default: 10) – The dimension of the latent factors.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
lambda_reg (float, optional, default: 0.001) – The lambda value used for regularization.
alpha (float, optional, default: 1.0) – The rate of confidence increase
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).
References
[1] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative Filtering for Implicit Feedback Datasets,” in 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy: IEEE, Dec. 2008, pp. 263-272. doi: 10.1109/ICDM.2008.22.
[2] implicit library: https://pypi.org/project/implicit/
- fit(train_set, val_set=None)[source]
Fit the model to observations.
- Parameters:
train_set (
cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.val_set (
cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).
- Returns:
self
- Return type:
Explainable Matrix Factorization (EMF)
- class cornac.models.emf.recom_emf.EMF(name='EMF', k=10, knn_num=10, knn_threshold=0.0, positive_rating_threshold=0.0, max_iter=20, learning_rate=0.001, lambda_reg=0.01, explain_reg=0.1, use_bias=True, early_stop=False, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None, sim_filter_zeros=True)
Explainable Matrix Factorization.
- Parameters:
k (int, optional, default: 10) – The dimension of the latent factors.
knn_num (int, optional, default: 10) – The number of nearest neighbors to be used for the explanation.
knn_threshold (float, optional, default: 0.0) – The threshold for the edge weight matrix between user-item pairs.
positive_rating_threshold (float, optional, default: 0.0) – The threshold for the positive ratings.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
learning_rate (float, optional, default: 0.001) – The learning rate.
lambda_reg (float, optional, default: 0.01) – The lambda value used for regularization.
explain_reg (float, optional, default: 0.1) – The lambda value used for regularization of the explanation.
use_bias (boolean, optional, default: True) – When True, user, item, and global biases are used.
early_stop (boolean, optional, default: False) – When True, delta loss will be checked after each iteration to stop learning earlier.
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors, ‘Bu’: user_biases, ‘Bi’: item_biases}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).
sim_filter_zeros (boolean, optional, default: True) – When True, the similarity matrix will be computed by filtering out the zero ratings. When False, the similarity matrix will be computed by considering the zero ratings.
References
B. Abdollahi and O. Nasraoui, “Explainable Matrix Factorization for Collaborative Filtering,” ACM Press, 2016, pp. 5-6. doi: 10.1145/2872518.2889405.
- compute_edge_weight_matrix()
Compute the edge weight matrix between user-item pairs of the model.
- fit(train_set, val_set=None)
Fit the model to observations.
- Parameters:
train_set (
cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.val_set (
cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).
- Returns:
self
- Return type:
- score(user_idx, item_idx=None)
Predict the scores/ratings of a user for an item.
- Parameters:
- Returns:
res – Relative scores that the user gives to the item or to all known items
- Return type:
A scalar or a Numpy array
Novel and Explainable Matrix Factorisation (NEMF)
- class cornac.models.nemf.recom_nemf.NEMF(name='NEMF', k=10, knn_num=10, knn_threshold=0.0, positive_rating_threshold=0.0, max_iter=20, learning_rate=0.001, lambda_reg=0.01, explain_reg=0.1, novel_reg=0.1, use_bias=True, early_stop=False, num_threads=0, trainable=True, verbose=False, init_params=None, seed=None, distance_metric='euclidean', sim_filter_zeros=True)
Novel and Explainable Matrix Factorisation.
- Parameters:
k (int, optional, default: 10) – The dimension of the latent factors.
knn_num (int, optional, default: 10) – The number of nearest neighbors to be used for the explanation.
knn_threshold (float, optional, default: 0.0) – The threshold for the edge weight matrix between user-item pairs.
positive_rating_threshold (float, optional, default: 0.0) – The threshold for the positive ratings.
max_iter (int, optional, default: 100) – Maximum number of iterations or the number of epochs for SGD.
learning_rate (float, optional, default: 0.001) – The learning rate.
lambda_reg (float, optional, default: 0.01) – The lambda value used for regularization.
explain_reg (float, optional, default: 0.1) – The lambda value used for regularization of the explanation.
novel_reg (float, optional, default: 0.1) – The delta value used for regularization of the novel matrix.
use_bias (boolean, optional, default: True) – When True, user, item, and global biases are used.
early_stop (boolean, optional, default: False) – When True, delta loss will be checked after each iteration to stop learning earlier.
num_threads (int, optional, default: 0) – Number of parallel threads for training. If num_threads=0, all CPU cores will be utilized. If seed is not None, num_threads=1 to remove randomness from parallelization.
trainable (boolean, optional, default: True) – When False, the model will not be re-trained, and input of pre-trained parameters are required.
verbose (boolean, optional, default: True) – When True, running logs are displayed.
init_params (dictionary, optional, default: None) – Initial parameters, e.g., init_params = {‘U’: user_factors, ‘V’: item_factors, ‘Bu’: user_biases, ‘Bi’: item_biases}
seed (int, optional, default: None) – Random seed for weight initialization. If specified, training will take longer because of single-thread (no parallelization).
distance_metric (string, optional, default: 'euclidean') – The distance metric used for computing the novel matrix.
sim_filter_zeros (boolean, optional, default: True) – When True, the similarity matrix will be computed by filtering out the zero values. When False, the similarity matrix will be computed by keeping the zero values.
References
[1] L. Coba, P. Symeonidis, and M. Zanker, “Personalised novel and explainable matrix factorisation,” Data & Knowledge Engineering, vol. 122, pp. 142-158, Jul. 2019, doi: 10.1016/j.datak.2019.06.003.
- compute_edge_weight_matrix()
Compute the edge weight matrix between user-item pairs of the model.
- compute_novel_matrix()
Compute the novel matrix of the model using distance-based model.
- fit(train_set, val_set=None)
Fit the model to observations.
- Parameters:
train_set (
cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.val_set (
cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).
- Returns:
self
- Return type:
- score(user_idx, item_idx=None)
Predict the scores/ratings of a user for an item.
- Parameters:
- Returns:
res – Relative scores that the user gives to the item or to all known items
- Return type:
A scalar or a Numpy array
Factoriazation Machine Recommender Algorithm (FMRec)
- class cornac.models.fm_py.recom_fm_py.FMRec(name='FMRec', trainable=True, verbose=True, uses_features=True, num_factors=50, num_iter=10, k0=True, k1=True, init_stdev=0.1, validation_size=0.01, power_t=0.5, t0=0.001, task='regression', initial_learning_rate=0.001, learning_rate_schedule='optimal')[source]
Factoriazation machine recommender algorithm
- Parameters:
name (str) – recommender name
trainable (bool) – whether model can be trained
verbose (bool) – Whether or not to print current iteration, training error
users_features (bool) – whether features are used
num_factors (int) – The dimensionality of the factorized 2-way interactions
num_iter (int,) – Number of iterations
k0 (bool, optional, default True) – Use bias.
k1 (bool, optional, default True) – Use 1-way interactions (learn feature weights).
init_stdev (double, optional, default 0.1) – Standard deviation for initialization of 2-way factors.
validation_size (double, optional, default 0.01) – Proportion of the training set to use for validation.
power_t (double, optional, default 0.5) – The exponent for inverse scaling learning rate [default 0.5].
t0 (double, optional, default 0.001) – Constant in the denominator for optimal learning rate schedule.
task (str, optional, default 'regression') – regression: Labels are real values. classification: Labels are either positive or negative.
initial_learning_rate (double, optional, default 0.001)
learning_rate_schedule (str, optional, default 'optimal') –
The learning rate:
constant: eta = eta0
optimal: eta = 1.0/(t+t0) [default]
invscaling: eta = eta0 / pow(t, power_t)
- class LimeRSDataset(dataset)[source]
Dataset object used for FM and Limers model training
- convert_to_feature_long()[source]
Convert features from row to column representation example: [(feature 1, item_idx), (feature 2, item_idx)] -> [item_idx, feature1, feature2]
- static convert_to_pyfm_format(df, columns=None)[source]
convert dataframe to sparse matrix format Return: sparse matrix, one hot encoded column names
- map_to_df(item='True')[source]
helper function to map item_idx to item_id or user_idx to user_id; returned df has xid.map.keys() as id and xid.map.items() as idx
- pick_top_items(count=1, train='True')[source]
pick the top n items based on train/test item frequency
- fit(train_set, val_set=None)[source]
Fit the model to observations.
- Parameters:
train_set (
cornac.data.Dataset, required) – User-Item preference data as well as additional modalities.val_set (
cornac.data.Dataset, optional, default: None) – User-Item preference data for model selection purposes (e.g., early stopping).
- Returns:
self
- Return type:
- score(user_id, item_id=None)[source]
Predict the scores/ratings of a user for an item.
- Parameters:
- Returns:
res – Relative scores that the user gives to the item or to all known items
- Return type:
A scalar or a Numpy array
- score_neighbourhood(neighborhood_df)[source]
make prediction on a list of items for each user. dataframe of [user_idx, item_idx] are passed and predictions are made by chunks. This is to void iterating each row and speed up the explain_instance function in limers
Returns: numpy array of predictions ordred by item_idx