A Dynamic Graph Model of Strategy Learning for Predicting Human Behavior in Repeated Games

Afrooz Vazifedan; Mohammad Izadi

doi:10.1515/bejte-2021-0015

Artikel

A Dynamic Graph Model of Strategy Learning for Predicting Human Behavior in Repeated Games

Afrooz Vazifedan und Mohammad Izadi

Veröffentlicht/Copyright: 6. Juni 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift The B.E. Journal of Theoretical Economics Band 23 Heft 1

Abstract

We present a model that explains the process of strategy learning by the players in repeated normal-form games. The proposed model is based on a directed weighted graph, which we define and call as the game’s dynamic graph. This graph is used as a framework by a learning algorithm that predicts which actions will be chosen by the players during the game and how the players are acting based on their gained experiences and behavioral characteristics. We evaluate the model’s performance by applying it to some human-subject datasets and measure the rate of correctly predicted actions. The results show that our model obtains a better average hit-rate compared to that of respective models. We also measure the model’s descriptive power (its ability to describe human behavior in the self-play mode) to show that our model, in contrast to the other behavioral models, is able to describe the alternation strategy in the Battle of the sexes game and the cooperating strategy in the Prisoners’ dilemma game.

Keywords: behavioral game theory; learning models; repeated games; graph-based algorithms; bounded rationality; strategy prediction

Corresponding author: Afrooz Vazifedan, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran, E-mail: af.vazifedan@gmail.com

Appendix

Table 6 indicates the estimated values for each of the parameters of the baseline models for every game in the datasets of Section 4.1. The search intervals for φ and δ parameters in EWA, Weighted Fictitious Play and Reinforcement Learning have been [0,1], and [0,100] for parameter λ in EWA and QRE, as suggested by the models’ designers. Table 7 reports the in-sample hit rate percentages of the proposed and the other models incorporating their estimated parameters on each experimental game. The best fit for each game among all models is printed in bold. The last column shows the average accuracy of the models across all games. The DGB model has achieved a better accuracy as an average among all the studied models in the training phase, as well as the test phase.

Table 6:

The fitted values for each of the parameters of the other learning models in every dataset used in the experiments.

Parameters	Games
Parameters	GRP1	GRP2	GRP3	GRP4	Patent	DSG3	nDSG3	DSG4	nDSG4	CPR1	CPR2	PD	Cont
φ (WFP)	1	0.9	1	1	0	0	0	0.9	0	0	0.4	0	0
φ (RL)	0.8	0.9	1	1	1	0	0.5	0.4	0.3	0.5	1	0.3	0
φ (EWA)	0.9	1	1	1	1	0.9	0.5	0.6	0.7	0	0	0	0.5
λ (EWA)	4	35	25	15	15	4	45	2	35	2	2	15	2
δ (EWA)	0.2	0.5	1	0.2	0	0	0	0.2	0	1	0.5	0.2	0.8
λ (QRE)	2	65	2	55	40	2	1	10	80	65	4	50	75
μ (RM)	510	750	200	800	240	9010	500	1110	1450	100	1120	1000	2040

Table 7:

In-sample accuracy (hit rate percentage) of the proposed and other learning models on the experimental games.

Models	Games
Models	GRP1	GRP2	GRP3	GRP4	Patent	DSG3	nDSG3	DSG4	nDSG4	CPR1	CPR2	PD	Cont	Average
Weighted fictitious play	37	25	35	24	46	14	62	19	23	19	13	61	25	31.0
Reinforcement learning	37	36	34	29	55	63	62	62	63	77	85	84	50	56.7
EWA	38	33	35	29	55	74	72	63	64	82	88	86	51	59.2
QRE	25	21	27	22	52	68	22	40	35	70	62	70	9	40.2
Regret matching	36	31	35	25	54	65	57	69	63	66	77	80	53	47.5
Cognitive hierarchy	25	15	24	16	14	31	39	33	32	78	87	57	12	35.6
DCNN	43	36	36	26	49	72	65	63	64	95	95	85	-	58.8
DGB (proposed)	46	41	42	39	58	78	76	69	68	95	96	80	56	64.9
Average of above models	35.9	29.8	33.5	26.3	47.9	58.1	56.9	52.3	51.5	72.8	75.4	75.4	36.6	50.1

References

Andreoni, J., and J. H. Miller. 1993. “Rational Cooperation in The Finitely Repeated Prisoner’s Dilemma: Experimental Evidence.” The Economic Journal 103 (418): 570–85.10.2307/2234532Suche in Google Scholar

Ansari, A., R. Montoya, and O. Netzer. 2012. “Dynamic Learning in Behavioral Games: A Hidden Markov Mixture-Of-Experts Approach.” Quantitative Marketing and Economics 10 (4): 475–503. https://doi.org/10.1007/s11129-012-9125-8.Suche in Google Scholar

Biecek, P., and T. Burzykowski. 2021. Explanatory Model Analysis. New York: Chapman and Hall/CRC.10.1201/9780429027192Suche in Google Scholar

Brown, G. W. 1951. “Iterative Solution of Games by Fictitious Play.” Activity Analysis of Production and Allocation 13 (1): 374–6.Suche in Google Scholar

Camerer, C. F., and T. H. Ho. 2015. “Behavioral Game Theory Experiments and Modeling (Chapter 10).” In Handbook of Game Theory with Economic Applications, Vol. 4, 517–73. Elsevier.10.1016/B978-0-444-53766-9.00010-0Suche in Google Scholar

Camerer, C. F., T. H. Ho, and J. K. Chong. 2002. “Sophisticated Experience-Weighted Attraction Learning and Strategic Teaching in Repeated Games.” Journal of Economic Theory 104 (1): 137–88. https://doi.org/10.1006/jeth.2002.2927.Suche in Google Scholar

Camerer, C. F., T. H. Ho, and J. K. Chong. 2004. “A Cognitive Hierarchy Model of Games.” Quarterly Journal of Economics 119 (3): 861–98. https://doi.org/10.1162/0033553041502225.Suche in Google Scholar

Camerer, C., and T. H. Ho. 1999. “Experience‐Weighted Attraction Learning in Normal Form Games.” Econometrica 67 (4): 827–74. https://doi.org/10.1111/1468-0262.00054.Suche in Google Scholar

Camerer, C., T. Ho, and K. Chong. 2003. “Models of Thinking, Learning, and Teaching in Games.” The American Economic Review 93 (2): 192–5. https://doi.org/10.1257/000282803321947038.Suche in Google Scholar

Cason, T. N., S. H. P. Lau, and V. L. Mui. 2013. “Learning, Teaching, and Turn Taking in The Repeated Assignment Game.” Economic Theory 54 (2): 335–57.10.1007/s00199-012-0718-ySuche in Google Scholar

Chen, W., Y. Chen, and D. K. Levine. 2015. “A Unifying Learning Framework for Building Artificial Game-Playing Agents.” Annals of Mathematics and Artificial Intelligence 73 (3–4): 335–58. https://doi.org/10.1007/s10472-015-9450-1.Suche in Google Scholar

Corley, H. W., and P. Kwain. 2014. “A Cooperative Dual to the Nash Equilibrium for Two-Person Prescriptive Games.” Journal of Applied Mathematics. https://doi.org/10.1155/2014/806794.Suche in Google Scholar

Erev, I., and A. E. Roth. 1998. “Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria.” The American Economic Review: 848–81.Suche in Google Scholar

Foerster, J., R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, and I. Mordatch. 2018. “Learning with Opponent-Learning Awareness.” In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 122–30. International Foundation for Autonomous Agents and Multiagent Systems.Suche in Google Scholar

Fudenberg, D., and D. K. Levine. 2016. “Whither Game Theory? Towards a Theory of Learning in Games.” Journal of Economic Perspectives 30 (4): 151–70.10.1257/jep.30.4.151Suche in Google Scholar

Georganas, S., P. J. Healy, and R. A. Weber. 2015. “On the Persistence of Strategic Sophistication.” Journal of Economic Theory 159: 369–400. https://doi.org/10.1016/j.jet.2015.07.012.Suche in Google Scholar

Goeree, J. K., C. A. Holt, and S. K. Laury. 2002. “Private Costs and Public Benefits: Unraveling the Effects of Altruism and Noisy Behavior.” Journal of Public Economics 83 (2): 255–76. https://doi.org/10.1016/s0047-2727(00)00160-2.Suche in Google Scholar

Hart, S., and A. Mas‐Colell. 2000. “A Simple Adaptive Procedure Leading to Correlated Quilibrium.” Econometrica 68 (5): 1127–50.10.1111/1468-0262.00153Suche in Google Scholar

Ho, T. H., C. F. Camerer, and J. K. Chong. 2007. “Self-tuning Experience Weighted Attraction Learning in Games.” Journal of Economic Theory 133 (1): 177–98. https://doi.org/10.1016/j.jet.2005.12.008.Suche in Google Scholar

Ho, T. H., C. Camerer, and K. Weigelt. 1998. “Iterated Dominance and Iterated Best Response in Experimental P-Beauty Contests.” The American Economic Review 88 (4): 947–69.Suche in Google Scholar

Ho, T.-H., and X. Su. 2013. “A Dynamic Level-K Model in Sequential Games.” Management Science 59: 452–69. https://doi.org/10.1287/mnsc.1120.1645.Suche in Google Scholar

Hyndman, K., E. Y. Ozbay, A. Schotter, and W. Z. E. Ehrblatt. 2012. “Convergence: An Experimental Study of Teaching and Learning in Repeated Games.” Journal of the European Economic Association 10 (3): 573–604. https://doi.org/10.1111/j.1542-4774.2011.01063.x.Suche in Google Scholar

Ioannou, C. A., and J. Romero. 2014. “A Generalized Approach to Belief Learning in Repeated Games.” Games and Economic Behavior 87: 178–203. https://doi.org/10.1016/j.geb.2014.05.007.Suche in Google Scholar

Izquierdo, L. R., S. S. Izquierdo, and F. Vega-Redondo. 2012. “Learning and Evolutionary Game Theory.” In Encyclopedia of the Sciences of Learning, 1782–8. Boston: Springer.10.1007/978-1-4419-1428-6_576Suche in Google Scholar

Kolumbus, Y., and G. Noti. 2019. “Neural Networks for Predicting Human Interactions in Repeated Games.” arXiv preprint arXiv:1911.03233.10.24963/ijcai.2019/56Suche in Google Scholar

Kümmerli, R., C. Colliard, N. Fiechter, B. Petitpierre, F. Russier, and L. Keller. 2007. “Human Cooperation in Social Dilemmas: Comparing the Snowdrift Game with the Prisoner’s Dilemma.” Proceedings of the Royal Society B: Biological Sciences 274 (1628): 2965–70.10.1098/rspb.2007.0793Suche in Google Scholar

Mathevet, L., and J. Romero. 2012. Predictive Repeated Game Theory: Measures and Experiments. New York: Mimeo.Suche in Google Scholar

McKelvey, R. D., and T. R. Palfrey. 1995. “Quantal Response Equilibria for Normal Form Games.” Games and Economic Behavior 10 (1): 6–38. https://doi.org/10.1006/game.1995.1023.Suche in Google Scholar

Mengel, F. 2014. “Learning by (Limited) Forward Looking Players.” Journal of Economic Behavior & Organization 108: 59–77. https://doi.org/10.1016/j.jebo.2014.08.001.Suche in Google Scholar

Mohlin, E., R. Ostling, and J. T. Y. Wang. 2014. “Learning by Imitation in Games: Theory, Field, and Laboratory.” In Economics Series Working Papers, 734.Suche in Google Scholar

Mookherjee, D., and B. Sopher. 1997. “Learning and Decision Costs in Experimental Constant Sum Games.” Games and Economic Behavior 19 (1): 97–132. https://doi.org/10.1006/game.1997.0540.Suche in Google Scholar

Nax, H. 2015. Behavioral Game Theory. Zurich: ETH Editions.Suche in Google Scholar

Rapoport, A., and W. Amaldoss. 2000. “Mixed Strategies and Iterative Elimination of Strongly Dominated Strategies: An Experimental Investigation of States of Knowledge.” Journal of Economic Behavior & Organization 42 (4): 483–521. https://doi.org/10.1016/s0167-2681(00)00101-3.Suche in Google Scholar

Terracol, A., and J. Vaksmann. 2009. “Dumbing Down Rational Players: Learning and Teaching in an Experimental Game.” Journal of Economic Behavior & Organization 70 (1–2): 54–71. https://doi.org/10.1016/j.jebo.2009.02.003.Suche in Google Scholar

Van Huyck, J. B., J. P. Cook, and R. C. Battalio. 1997. “Adaptive Behavior and Coordination Failure.” Journal of Economic Behavior & Organization 32 (4): 483–503. https://doi.org/10.1016/s0167-2681(97)00007-3.Suche in Google Scholar

Vazifedan, A., and M. Izadi. 2021. “Predicting Human Behavior in Size-Variant Repeated Games through Deep Convolutional Neural Networks.” Progress in Artificial Intelligence 11: 1–14. https://doi.org/10.1007/s13748-021-00258-y.Suche in Google Scholar

Received: 2021-01-23

Revised: 2021-12-10

Accepted: 2022-03-26

Published Online: 2022-06-06

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/bejte-2021-0015

Schlagwörter für diesen Artikel

behavioral game theory; learning models; repeated games; graph-based algorithms; bounded rationality; strategy prediction