Abstract
We analyze market baskets of individual households in two consumer durables categories (music, computer related products) by the multivariate logit (MVL) model, its finite mixture extension (FM-MVL) and the conditional restricted Boltzmann machine (CRBM). The CRBM attains a vastly better out-of-sample performance than MVL and FM-MVL models. Based on simulation-based likelihood ratio tests we prefer the CRBM to the FM-MVL model. To interpret hidden variables of conditional Boltzmann machines we look at their average probability differences between purchase and non-purchases of any sub-category across all baskets. To measure interdependences we compute cross effects between sub-categories for the best performing FM-MVL model and CRBM. In both product categories the CRBM indicates more or higher positive cross effects than the FM-MVL model. Finally, we suggest appropriate future research based on larger and more detailed data sets.
A Conditional Probabilities of the Investigated Models
For the MVL model the conditional probability of a purchase of product j can be written as:
For the FM-MVL model with S segments we obtain the following expression for the conditional probability of a purchase of product j:
Parameters of this model are segment-specific. π s denotes the posterior probability of belonging to segment s.
For the CRBM we obtain the following expressions for the conditional probabilities of purchases given hidden variables and for hidden variables given purchases (Li et al. 2015):
y j denotes the binary purchase indicator for product j, h k the binary kth hidden variable.
B Estimation of the MVL Model
Maximum likelihood estimation of the MVL model requires computation of the so-called normalization constant in every iteration that is obtained by summing over 2 J possible market baskets. Only when expression (1) is divided by the normalization constant a proper probability results. For 30 products we would have to deal with more than 1.0 × 109 possible market baskets. Because of the impracticality of this approach we resort to maximum pseudo-likelihood (MPL) estimation. In a simulation study Bel et al. (2018) compare MPL to maximum likelihood estimation for a maximum number of 12 alternatives. These authors conclude that MPL estimation leads to negligible efficiency losses only.
The pseudo-probability
Basket
MPL estimation is feasible, because the normalization constant drops out in expression (8). Moreover, it is straightforward as the pseudo-likelihood function has only one local maximum. For the MVL model the pseudo-probability P j for product j in basket y is given by (Besag 1972, 1974):
The log pseudo-likelihood LPL of basket y is obtained by summing the logs of pseudo-probabilities across all products
C Estimation of the FM-MVL Model
We assign households to mixture components (i.e., segments) by the Gibbs sampling approach of Shi et al. (2005) replacing the intractable log likelihood value of a basket by its log pseudo likelihood value like in Dippold and Hruschka (2013a) as part of the estimation process. In each iteration, one MVL model specific to the households currently assigned to a segment is estimated by MPL. We start from 100 initial random allocations of households to segments, as the FM-MVL model may be subject to local optima. We choose the solution leading to the best log likelihood value for the estimation sample determined by the Gibbs sampling procedure explained in section 2.3.
D Estimation of the RBM and CRBM
We estimate the RBM and the CRBM by the contrastive divergence (CD) algorithm of Hinton (2002) which approximates the log likelihood. For the CRBM we extend the CD algorithm by adding gradients for the coefficients in β and δ k . Because of the existence of local optima we start the CD algorithm 100 times with random initial coefficient values. Just like for the FM-MVL model we choose the solution of a RBM or CRBM attaining the best log likelihood value for the estimation data using the Gibbs sampling procedure explained in section 2.3.
E Simulation-Based Computation of the Likelihood Ratio Test
The likelihood ratio statistic LRT with LL 1 and LL 0 as log likelihood values for the alternative and the null model can be written as:
The simulation-based approach for the LRT (Lewis et al. 2011) consists of three steps:
Generate S bootstrap samples from the null model.
For each bootstrap sample fit both the null model and the alternative model, determine the likelihood values of these models by the Gibbs sampling procedure explained in section 2.3 and compute the LRT statistic.
The null model is rejected if the proportion of the test statistics for the bootstrap samples which are greater than the test statistic for the estimation data exceeds a prespecified significance level.
References
Beentjes, S. V., and A. Khamseh. 2020. Higher-Order Interactions in Statistical Physics and Machine Learning: A Non-parametric Solution to the Inverse Problem. Working Paper. arXiv:2006.06010.10.1103/PhysRevE.102.053314Search in Google Scholar
Bel, K., D. Fok, and R. Paap. 2018. “Parameter Estimation in Multivariate Logit Models with Many Binary Choices.” Econometric Reviews 37: 534–50. https://doi.org/10.1080/07474938.2015.1093780.Search in Google Scholar
Bengio, Y. 2009. “Learning Deep Architectures for AI.” Foundation and Trends in Machine Learning 2: 1–27. https://doi.org/10.1561/2200000006.Search in Google Scholar
Besag, J. 1972. “Nearest-Neighbour Systems and the Auto-Logistic Model for Binary Data.” Journal of the Royal Statistical Society: Series B 34: 75–83. https://doi.org/10.1111/j.2517-6161.1972.tb00889.x.Search in Google Scholar
Besag, J. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society: Series B 36: 192–236. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x.Search in Google Scholar
Betancourt, R., and D. Gautschi. 1990. “Demand Complementarities, Household Production, and Retail Assortments.” Marketing Science 9: 146–61. https://doi.org/10.1287/mksc.9.2.146.Search in Google Scholar
Boztuğ, Y., and L. Hildebrandt. 2008a. “Modeling Joint Purchases with a Multivariate MNL Approach.” Schmalenbach Business Review 60: 400–22.10.1007/BF03396777Search in Google Scholar
Boztuğ, Y., and T. Reutterer. 2008b. “A Combined Approach for Segment-Specific Market Basket Analysis.” European Journal of Operational Research 187: 294–312.10.1016/j.ejor.2007.03.001Search in Google Scholar
Boztuğ, Y., and N. Silberhorn. 2006. “Modellierungsansätze in der Warenkorbanalyse im Überblick.” Journal für Betriebswirtschaft 56: 105–28.10.1007/s11301-006-0008-5Search in Google Scholar
Chib, S., P. B. Seetharaman, and A. Strijnev. 2002. “Analysis of Multi-Category Purchase Incidence Decisions Using IRI Market Basket Data.” In Econometric Models in Marketing, edited by P. H. Franses and A. L. Montgomery, pp. 57–92. Amsterdam: JAI.10.1016/S0731-9053(02)16004-XSearch in Google Scholar
Cox, D. 1972. “The Analysis of Multivariate Binary Data.” Journal of the Royal Statistical Society: Series A C 21: 113–20. https://doi.org/10.2307/2346482.Search in Google Scholar
Dippold, K., and H. Hruschka. 2013a. “A Model of Heterogeneous Multicategory Choice for Market Basket Analysis.” Review of Marketing Science 11: 1–31. https://doi.org/10.1515/roms-2012-0001.Search in Google Scholar
Dippold, K., and H. Hruschka. 2013b. “Variable Selection for Market Basket Analysis.” Computational Statistics 28: 519–29. https://doi.org/10.1007/s00180-012-0315-3.Search in Google Scholar
Duvvuri, S. D., A. Ansari, and S. Gupta. 2007. “Consumers’ Price Sensitivities across Complementary Categories.” Management Science 53: 1933–45. https://doi.org/10.1287/mnsc.1070.0744.Search in Google Scholar
Elliot, G. C. 1988. “Interpreting Higher Order Interactions in Log-Linear Analysis.” Psychological Bulletin 103: 121–30.10.1037/0033-2909.103.1.121Search in Google Scholar
Hinton, G. E. 2002. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14: 1771–800. https://doi.org/10.1162/089976602760128018.Search in Google Scholar
Hinton, G. E., and R. R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313: 504–7. https://doi.org/10.1126/science.1127647.Search in Google Scholar
Hruschka, H. 2014. “Analyzing Market Baskets by Restricted Boltzmann Machines.” Spectrum 36: 209–28. https://doi.org/10.1007/s00291-012-0303-6.Search in Google Scholar
Hruschka, H. 2017a. “Analyzing the Dependences of Multicategory Purchases on Interactions of Marketing Variables.” Journal of Business Economics 87: 295–313. https://doi.org/10.1007/s11573-016-0820-x.Search in Google Scholar
Hruschka, H. 2017b. “Multi-Category Purchase Incidences with Marketing Cross Effects.” Review of Managerial Science 11: 443–69. https://doi.org/10.1007/s11846-016-0193-0.Search in Google Scholar
Hruschka, H. 2019. Comparing Unsupervised Probabilistic Machine Learning Methods for Market Basket Analysis. Review of Managerial Science. https://doi.org/10.1007/s11846-019-00349-0.Search in Google Scholar
Hyvärinen, A. 2006. “Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines.” Neural Computation 18: 2283–92. https://doi.org/10.1162/neco.2006.18.10.2283.Search in Google Scholar
Lewis, F., B. Adam, and L. Gilbert. 2011. “A Unified Approach to Model Selection Using the Likelihood Ratio Test.” Methods in Ecology and Evolution 2: 155–62. https://doi.org/10.1111/j.2041-210x.2010.00063.x.Search in Google Scholar
Le Roux, N. and Y. Bengio. 2007. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Technical Report 1294, Département d’informatique et recherche opérationnelle. Université de Montréal.Search in Google Scholar
Li, X., F. Zhao, and Y. Guo. 2015. “Conditional Restricted Boltzmann Machines for Multi-Label Learning with Incomplete Labels.” In Proceedings of the 18th AISTATS Conference. San Diego, CA.Search in Google Scholar
Manchanda, P., A. Ansari, and S. Gupta. 1999. “The “Shopping Basket”: A Model for Multi-Category Purchase Incidence Decisions.” Marketing Science 18: 95–114. https://doi.org/10.1287/mksc.18.2.95.Search in Google Scholar
Mnih, V., H. LaRochelle, and G. E. Hinton. 2011. “Conditional Restricted Boltzmann Machines for Structured Output Prediction.” In Proceedings ot the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain.Search in Google Scholar
Montfar, G. 2018. “Restricted Boltzmann Machines: Introduction and Review.” In Information Geometry And its Applications: On the Occasion of Shun-Ichi Amari’s 80th Birthday, edited by N. Ay, P. Gibilisco, and F. Mats, pp. 75–115. Basel, Switzerland, MA: Springer Nature.10.1007/978-3-319-97798-0_4Search in Google Scholar
Ni, J., S. A. Neslin, and B. Sun. 2012. “Database Submission the ISMS Durable Goods Data Sets.” Marketing Science 31: 1008–13. https://doi.org/10.1287/mksc.1120.0726.Search in Google Scholar
Russell, G. J. and A. Petersen. 2000. “Analysis of Cross Category Dependence in Market Basket Selection.” Journal of Retailing 76: 369–92. https://doi.org/10.1016/s0022-4359(00)00030-0.Search in Google Scholar
Seetharaman, P.B., S. Chib, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, and A. Strijnev. 2005. “Models of Multi-Category Choice Behavior.” Marketing Letters 16: 239–54. https://doi.org/10.1007/s11002-005-5888-y.Search in Google Scholar
Shi, J. Q., R. Murray-Smith, and D. Michael Titterington. 2005. “Hierarchical Gaussian Process Mixtures for Regression.” Statistical Computation 15: 31–41. https://doi.org/10.1007/s11222-005-4787-7.Search in Google Scholar
Smolensky, P. 1986. “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing: Explorations In the Microstructure of Cognition, 1, edited by D. E. Rumelhart, and J. L. McClelland, pp. 194–281. Cambridge, MA: MIT Press. Foundations.Search in Google Scholar
Williams, D.A. 1970. “Discrimination between Regression Models to Determine the Pattern of Enzyme Synthesis in Synchronous Cell Cultures.” Biometrics 26: 23–32. https://doi.org/10.2307/2529041.Search in Google Scholar
Xia, F., R. Chatterjee, and J. H. May. 2019. “Using Conditional Restricted Boltzmann Machines to Model Complex Consumer Shopping Patterns.” Marketing Science 38: 711–27. https://doi.org/10.1287/mksc.2019.1162.Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Complementarity of Information Products
- Interdependences of Products in Market Baskets: Comparing the Conditional Restricted Boltzmann Machine to the Multivariate Logit Model
- Taking It a Step Further: When do Followers Adopt Influencers’ Own Brands?
- Choosing Among Alternative Brands: Revisiting the Way Involvement Drives Consumer Selectivity
- The Examination of Cultural Values and Social Media Usages in China
- The Effects of Country-Image and Animosity on Asian Consumers’ Responses to Foreign Brands
- Sales – Response Model in Marketing Revisited in the Times of Uncertainty and Global Turmoil
Articles in the same Issue
- Frontmatter
- Complementarity of Information Products
- Interdependences of Products in Market Baskets: Comparing the Conditional Restricted Boltzmann Machine to the Multivariate Logit Model
- Taking It a Step Further: When do Followers Adopt Influencers’ Own Brands?
- Choosing Among Alternative Brands: Revisiting the Way Involvement Drives Consumer Selectivity
- The Examination of Cultural Values and Social Media Usages in China
- The Effects of Country-Image and Animosity on Asian Consumers’ Responses to Foreign Brands
- Sales – Response Model in Marketing Revisited in the Times of Uncertainty and Global Turmoil