Beyond dichotomies in reinforcement learning


  • 1.

    Roiser, J. P. & Sahakian, B. J. Hot and cold cognition in depression. CNS Spectr. 18, 139–149 (2013).

    PubMed 

    Google Scholar
     

  • 2.

    Dickinson, A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. London. B Biol. Sci. 308, 67–78 (1985).


    Google Scholar
     

  • 3.

    Sloman, S. A. The empirical case for two systems of reasoning. Psychol. Bull. 119, 3 (1996).


    Google Scholar
     

  • 4.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 5.

    Stanovich, K. E. & West, R. F. Individual differences in reasoning: implications for the rationality debate? Behav. Brain Sci. 23, 645–665 (2000).

    CAS 
    PubMed 

    Google Scholar
     

  • 6.

    Kahneman, D. & Frederick, S. in Heuristics and Biases: The Psychology of Intuitive Judgment Ch. 2 (eds Gilovich, T., Griffin, D. & Kahneman, D.) 49–81 (Cambridge Univ. Press, 2002).

  • 7.

    Daw, N. in Decision Making, Affect, and Learning: Attention and Performance XXIII Ch. 1 (eds Delgado, M. R., Phelps, E. A. and Robbins, T. W.) 1–26 (Oxford Univ. Press, 2011).

  • 8.

    Marr, D. & Poggio, T. A computational theory of human stereo vision. Proc. R. Soc. Lond. B. Biol. Sci. 204, 301–328 (1979).

    CAS 
    PubMed 

    Google Scholar
     

  • 9.

    Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 10.

    Daw, N. D. Are we of two minds? Nat. Neurosci. 21, 1497–1499 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • 11.

    Dayan, P. Goal-directed control and its antipodes. Neural Netw. 22, 213–219 (2009).

    PubMed 

    Google Scholar
     

  • 12.

    da Silva, C. F. & Hare, T. A. A note on the analysis of two-stage task results: how changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability. PLoS ONE 13, e0195328 (2018).


    Google Scholar
     

  • 13.

    Moran, R., Keramati, M., Dayan, P. & Dolan, R. J. Retrospective model-based inference guides model-free credit assignment. Nat. Commun. 10, 750 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 14.

    Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 15.

    Shahar, N. et al. Credit assignment to state-independent task representations and its relationship with model-based decision making. Proc. Natl Acad. Sci. USA 116, 15871–15876 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  • 16.

    Deserno, L. & Hauser, T. U. Beyond a cognitive dichotomy: can multiple decision systems prove useful to distinguish compulsive and impulsive symptom dimensions? Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2020.03.004 (2020).

  • 17.

    Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS 
    PubMed 

    Google Scholar
     

  • 18.

    Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • 19.

    Thorndike, E. L. Animal Intelligence: Experimental Studies (Transaction, 1965).

  • 20.

    Bush, R. R. & Mosteller, F. Stochastic models for learning (John Wiley & Sons, Inc. 1955).

  • 21.

    Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).

    CAS 
    PubMed 

    Google Scholar
     

  • 22.

    Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory Ch. 3 (eds Black, A. H. & Prokasy, W. F) 64–99 (Appleton-Century-Crofts, 1972).

  • 23.

    Sutton, R. S. & Barto, A. G. Reinforcement learning: An Introduction (MIT Press, 2018).

  • 24.

    Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 25.

    Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 26.

    Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).

    CAS 
    PubMed 

    Google Scholar
     

  • 27.

    Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 28.

    Shen, W., Flajolet, M., Greengard, P. & Surmeier, D. J. Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321, 848–851 (2008).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 29.

    Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 30.

    Kim, K. M. et al. Optogenetic mimicry of the transient activation of dopamine neurons by natural reward is sufficient for operant reinforcement. PLoS ONE 7, e33612 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 31.

    O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

    PubMed 

    Google Scholar
     

  • 32.

    McClure, S. M., Berns, G. S. & Montague, P. R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003).

    CAS 
    PubMed 

    Google Scholar
     

  • 33.

    Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  • 34.

    Lau, B. & Glimcher, P. W. Value representations in the primate striatum during matching behavior. Neuron 58, 451–463 (2008).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 35.

    Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science 306, 1940–1943 (2004).

    CAS 
    PubMed 

    Google Scholar
     

  • 36.

    Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T. & Hutchison, K. E. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl Acad. Sci. USA 104, 16311–16316 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  • 37.

    Cockburn, J., Collins, A. G. & Frank, M. J. A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 83, 551–557 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 38.

    Frank, M. J., O’Reilly, R. C. & Curran, T. When memory fails, intuition reigns: midazolam enhances implicit inference in humans. Psychol. Sci. 17, 700–707 (2006).

    PubMed 

    Google Scholar
     

  • 39.

    Doll, B. B., Hutchison, K. E. & Frank, M. J. Dopaminergic genes predict individual differences in susceptibility to confirmation bias. J. Neurosci. 31, 6188–6198 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 40.

    Doll, B. B. et al. Reduced susceptibility to confirmation bias in schizophrenia. Cogn. Affect. Behav. Neurosci. 14, 715–728 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 41.

    Berridge, K. C. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology 191, 391–431 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  • 42.

    Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

    CAS 
    PubMed 

    Google Scholar
     

  • 43.

    Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 44.

    Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).

    CAS 
    PubMed 

    Google Scholar
     

  • 45.

    Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 46.

    Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A. & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl Acad. Sci. USA 110, 20941–20946 (2013).

    CAS 
    PubMed 

    Google Scholar
     

  • 47.

    Wunderlich, K., Smittenaar, P. & Dolan, R. J. Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–424 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 48.

    Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. USA 112, 1595–1600 (2015).

    CAS 
    PubMed 

    Google Scholar
     

  • 49.

    Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 50.

    Groman, S. M., Massi, B., Mathias, S. R., Lee, D. & Taylor, J. R. Model-free and model-based influences in addiction-related behaviors. Biol. Psychiatry 85, 936–945 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 51.

    Doll, B. B., Simon, D. A. & Daw, N. D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 52.

    Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl Acad. Sci. USA 112, 201506367 (2015).


    Google Scholar
     

  • 53.

    O’Reilly, R. C. & Frank, M. J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 18, 283–328 (2006).

    PubMed 

    Google Scholar
     

  • 54.

    Collins, A. G. & Frank, M. J. Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychol. Rev. 120, 190–229 (2013).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 55.

    Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680–692 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 56.

    Da Silva, C. F. & Hare, T. A. Humans are primarily model-based and not model-free learners in the two-stage task. bioRxiv https://doi.org/10.1101/682922 (2019).

  • 57.

    Toyama, A., Katahira, K. & Ohira, H. Biases in estimating the balance between model-free and model-based learning systems due to model misspecification. J. Math. Psychol. 91, 88–102 (2019).


    Google Scholar
     

  • 58.

    Iigaya, K., Fonseca, M. S., Murakami, M., Mainen, Z. F. & Dayan, P. An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat. Commun. 9, 2477 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 59.

    Mohr, H. et al. Deterministic response strategies in a trial-and-error learning task. PLoS Comput. Biol. 14, e1006621 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 60.

    Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 61.

    Boorman, E. D., Behrens, T. E. & Rushworth, M. F. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol. 9, e1001093 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 62.

    Behrens, T. E., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  • 63.

    Collins, A. G. E. & Koechlin, E. Reasoning, learning, and creativity: frontal lobe function and human decision-making. PLoS Biol. 10, e1001293 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 64.

    Gershman, S. J., Norman, K. A. & Niv, Y. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 5, 43–50 (2015).


    Google Scholar
     

  • 65.

    Badre, D., Kayser, A. S. & Esposito, M. D. Article frontal cortex and the discovery of abstract action rules. Neuron 66, 315–326 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 66.

    Konovalov, A. & Krajbich, I. Mouse tracking reveals structure knowledge in the absence of model-based choice. Nat. Commun. 11, 1893 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 67.

    Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 68.

    Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl Acad. Sci. USA 112, 3098–3103 (2015).

    CAS 
    PubMed 

    Google Scholar
     

  • 69.

    Suzuki, S., Cross, L. & O’Doherty, J. P. Elucidating the underlying components of food valuation in the human orbitofrontal cortex. Nat. Neurosci. 20, 1786 (2017).


    Google Scholar
     

  • 70.

    Badre, D., Doll, B. B., Long, N. M. & Frank, M. J. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73, 595–607 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 71.

    Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore–exploit dilemma. J. Exp. Psychol. Gen. 143, 2074 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 72.

    Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).

    PubMed 

    Google Scholar
     

  • 73.

    Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 74.

    Badre, D. & Frank, M. J. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. Cereb. Cortex 22, 527–536 (2012).

    PubMed 

    Google Scholar
     

  • 75.

    Collins, A. G. E. Reinforcement learning: bringing together computation and cognition. Curr. Opin. Behav. Sci. 29, 63–68 (2019).


    Google Scholar
     

  • 76.

    Collins, A. G. in Goal-directed Decision Making (eds Morris, R., Bornstein, A. & Shenhav, A) 105–123 (Elsevier, 2018).

  • 77.

    Donoso, M., Collins, A. G. E. & Koechlin, E. Foundations of human reasoning in the prefrontal cortex. Science 344, 1481–1486 (2014).

    CAS 
    PubMed 

    Google Scholar
     

  • 78.

    Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–278 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 79.

    Schuck, N. W., Wilson, R. & Niv, Y. in Goal-directed Decision Making (eds Morris, R., Bornstein, A. & Shenhav, A) 259–278 (Elsevier, 2018).

  • 80.

    Ballard, I. C., Wagner, A. D. & McClure, S. M. Hippocampal pattern separation supports reinforcement learning. Nat. Commun. 10, 1073 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 81.

    Redish, A. D., Jensen, S., Johnson, A. & Kurth-Nelson, Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol. Rev. 114, 784 (2007).

    PubMed 

    Google Scholar
     

  • 82.

    Bouton, M. E. Context and behavioral processes in extinction. Learn. Mem. 11, 485–494 (2004).

    PubMed 

    Google Scholar
     

  • 83.

    Rescorla, R. A. Spontaneous recovery. Learn. Mem. 11, 501–509 (2004).

    PubMed 

    Google Scholar
     

  • 84.

    O’Reilly, R. C., Frank, M. J., Hazy, T. E. & Watz, B. PVLV: the primary value and learned value Pavlovian learning algorithm. Behav. Neurosci. 121, 31 (2007).

    PubMed 

    Google Scholar
     

  • 85.

    Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).

    PubMed 

    Google Scholar
     

  • 86.

    Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • 87.

    Iigaya, K. et al. Deviation from the matching law reflects an optimal strategy involving learning over multiple timescales. Nat. Commun. 10, 1466 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 88.

    Collins, A. G. E. & Frank, M. J. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35, 1024–1035 (2012).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 89.

    Collins, A. G. E. The tortoise and the hare: interactions between reinforcement learning and working memory. J. Cogn. Neurosci. 30, 1422–1432 (2017).


    Google Scholar
     

  • 90.

    Viejo, G., Girard, B. B., Procyk, E. & Khamassi, M. Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task. Behav. Brain Res. 355, 76–89 (2017).

    PubMed 

    Google Scholar
     

  • 91.

    Poldrack, R. A. et al. Interactive memory systems in the human brain. Nature 414, 546–550 (2001).

    CAS 
    PubMed 

    Google Scholar
     

  • 92.

    Foerde, K. & Shohamy, D. Feedback timing modulates brain systems for learning in humans. J. Neurosci. 31, 13157–13167 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 93.

    Bornstein, A. M., Khaw, M. W., Shohamy, D. & Daw, N. D. Reminders of past choices bias decisions for reward in humans. Nat. Commun. 8, 15958 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 94.

    Bornstein, A. M. & Norman, K. A. Reinstated episodic context guides sampling-based decisions for reward. Nat. Neurosci. 20, 997–1003 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  • 95.

    Vikbladh, O. M. et al. Hippocampal contributions to model-based planning and spatial memory. Neuron 102, 683–693 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 96.

    Decker, J. H., Otto, A. R., Daw, N. D. & Hartley, C. A. From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27, 848–858 (2016).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 97.

    Dickinson, A. & Balleine, B. Motivational control of goal-directed action. Anim. Learn. Behav. 22, 1–18 (1994).


    Google Scholar
     

  • 98.

    Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).

    CAS 
    PubMed 

    Google Scholar
     

  • 99.

    Daw, N. D. & Doya, K. The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16, 199–204 (2006).

    CAS 
    PubMed 

    Google Scholar
     

  • 100.

    Friedel, E. et al. Devaluation and sequential decisions: linking goal-directed and model-based behavior. Front. Hum. Neurosci. 8, 587 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 101.

    de Wit, S. et al. Shifting the balance between goals and habits: five failures in experimental habit induction. J. Exp. Psychol. Gen. 147, 1043–1065 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 102.

    Madrigal, R. Hot vs. cold cognitions and consumers’ reactions to sporting event outcomes. J. Consum. Psychol. 18, 304–319 (2008).


    Google Scholar
     

  • 103.

    Peterson, E. & Welsh, M. C. in Handbook of Executive Functioning (eds Goldstein, S. & Naglieri, J. A.) 45–65 (Springer, 2014).

  • 104.

    Barch, D. M. et al. Explicit and implicit reinforcement learning across the psychosis spectrum. J. Abnorm. Psychol. 126, 694–711 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 105.

    Taylor, J. A., Krakauer, J. W. & Ivry, R. B. Explicit and implicit contributions to learning in a sensorimotor adaptation task. J. Neurosci. 34, 3023–3032 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 106.

    Sloman, S. A. in Heuristics and biases: The psychology of intuitive judgment Ch. 22 (eds Gilovich, T., Griffin, D. & Kahneman D.) 379–396 (Cambridge Univ. Press, 2002).

  • 107.

    Evans, J. S. B. T. in In two minds: Dual processes and beyond (eds J. S. B. T. Evans & K. Frankish) p. 33–54 (Oxford Univ. Press, 2009).

  • 108.

    Stanovich, K. Rationality and the Reflective Mind (Oxford Univ. Press, 2011).

  • 109.

    Dayan, P. The convergence of TD(λ) for general λ. Mach. Learn. 8, 341–362 (1992).


    Google Scholar
     

  • 110.

    Caplin, A. & Dean, M. Axiomatic methods, dopamine and reward prediction error. Curr. Opin. Neurobiol. 18, 197–202 (2008).

    CAS 
    PubMed 

    Google Scholar
     

  • 111.

    van den Bos, W., Bruckner, R., Nassar, M. R., Mata, R. & Eppinger, B. Computational neuroscience across the lifespan: promises and pitfalls. Dev. Cogn. Neurosci. 33, 42–53 (2018).

    PubMed 

    Google Scholar
     

  • 112.

    Adams, R. A., Huys, Q. J. & Roiser, J. P. Computational psychiatry: towards a mathematically informed understanding of mental illness. J. Neurol. Neurosurg. Psychiatry 87, 53–63 (2016).

    PubMed 

    Google Scholar
     

  • 113.

    Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 114.

    Botvinick, M. M., Niv, Y. & Barto, A. Hierarchically organized behavior and its neural foundations: a reinforcement-learning perspective. Cognition 113, 262–280 (2009).

    PubMed 

    Google Scholar
     

  • 115.

    Konidaris, G. & Barto, A. G. in Advances in Neural Information Processing Systems 22 (eds Bengio, Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I. & Culotta, A.) 1015–1023 (NIPS, 2009).

  • 116.

    Konidaris, G. On the necessity of abstraction. Curr. Opin. Behav. Sci. 29, 1–7 (2019).

    PubMed 

    Google Scholar
     

  • 117.

    Frank, M. J. & Fossella, J. A. Neurogenetics and pharmacology of learning, motivation, and cognition. Neuropsychopharmacology 36, 133–152 (2010).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 118.

    Collins, A. G. E., Cavanagh, J. F. & Frank, M. J. Human EEG uncovers latent generalizable rule structure during learning. J. Neurosci. 34, 4677–4685 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 119.

    Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).

    CAS 
    PubMed 

    Google Scholar
     

  • 120.

    Fermin, A. S. et al. Model-based action planning involves cortico-cerebellar and basal ganglia networks. Sci. Rep. 6, 31378 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 121.

    Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143, 182 (2014).

    PubMed 

    Google Scholar
     

  • 122.

    Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 123.

    Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S. I. & Battaglia, F. P. Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat. Neurosci. 12, 919–926 (2009).

    CAS 

    Google Scholar
     

  • 124.

    Collins, A. G. E., Albrecht, M. A., Waltz, J. A., Gold, J. M. & Frank, M. J. Interactions among working memory, reinforcement learning, and effort in value-based choice: a new paradigm and selective deficits in schizophrenia. Biol. Psychiatry 82, 431–439 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 125.

    Collins, A. G. E., Ciullo, B., Frank, M. J. & Badre, D. Working memory load strengthens reward prediction errors. J. Neurosci. 37, 2700–2716 (2017).


    Google Scholar
     

  • 126.

    Collins, A. A. G. E. & Frank, M. J. M. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. Proc. Natl Acad. Sci. USA 115, 2502–2507 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • 127.

    Knowlton, B. J., Mangels, J. A. & Squire, L. R. A neostriatal habit learning system in humans. Science 273, 1399–1402 (1996).

    CAS 
    PubMed 

    Google Scholar
     

  • 128.

    Squire, L. R. & Zola, S. M. Structure and function of declarative and nondeclarative memory systems. Proc. Natl Acad. Sci. USA 93, 13515–13522 (1996).

    CAS 
    PubMed 

    Google Scholar
     

  • 129.

    Eichenbaum, H. et al. Memory, Amnesia, and the Hippocampal System (MIT Press, 1993).

  • 130.

    Foerde, K. & Shohamy, D. The role of the basal ganglia in learning and memory: insight from Parkinson’s disease. Neurobiol. Learn. Mem. 96, 624–636 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 131.

    Wimmer, G. E., Daw, N. D. & Shohamy, D. Generalization of value in reinforcement learning by humans. Eur. J. Neurosci. 35, 1092–1104 (2012).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 132.

    Wimmer, G. E., Braun, E. K., Daw, N. D. & Shohamy, D. Episodic memory encoding interferes with reward learning and decreases striatal prediction errors. J. Neurosci. 34, 14901–14912 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 133.

    Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 134.

    Kool, W., Cushman, F. A. & Gershman, S. J. in Goal-directed Decision Making Ch. 7 (eds Morris, R. W. & Bornstein, A.) 153–178 (Elsevier, 2018).

  • 135.

    Langdon, A. J., Sharpe, M. J., Schoenbaum, G. & Niv, Y. Model-based predictions for dopamine. Curr. Opin. Neurobiol. 49, 1–7 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • 136.

    Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci. 20, 581–589 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 137.

    Krueger, K. A. & Dayan, P. Flexible shaping: how learning in small steps helps. Cognition 110, 380–394 (2009).

    PubMed 

    Google Scholar
     

  • 138.

    Bhandari, A. & Badre, D. Learning and transfer of working memory gating policies. Cognition 172, 89–100 (2018).

    PubMed 

    Google Scholar
     

  • 139.

    Leong, Y. C. et al. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 140.

    Farashahi, S., Rowe, K., Aslami, Z., Lee, D. & Soltani, A. Feature-based learning improves adaptability without compromising precision. Nat. Commun. 8, 1768 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 141.

    Bach, D. R. & Dolan, R. J. Knowing how much you don’t know: a neural organization of uncertainty estimates. Nat. Rev. Neurosci. 13, 572–586 (2012).

    CAS 
    PubMed 

    Google Scholar
     

  • 142.

    Pulcu, E. & Browning, M. The misestimation of uncertainty in affective disorders. Trends Cogn. Sci. 23, 865–875 (2019).

    PubMed 

    Google Scholar
     

  • 143.

    Badre, D., Frank, M. J. & Moore, C. I. Interactionist neuroscience. Neuron 88, 855–860 (2015).

    CAS 
    PubMed 

    Google Scholar
     

  • 144.

    Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A. & Poeppel, D. Neuroscience needs behavior: correcting a reductionist bias. Neuron 93, 480–490 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  • 145.

    Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 146.

    Smittenaar, P., FitzGerald, T. H., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 147.

    Doll, B. B., Bath, K. G., Daw, N. D. & Frank, M. J. Variability in dopamine genes dissociates model-based and model-free reinforcement learning. J. Neurosci. 36, 1211–1222 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 148.

    Voon, V. et al. Motivation and value influences in the relative balance of goal-directed and habitual behaviours in obsessive-compulsive disorder. Transl. Psychiatry 5, e670 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • 149.

    Voon, V., Reiter, A., Sebold, M. & Groman, S. Model-based control in dimensional psychiatry. Biol. Psychiatry 82, 391–400 (2017).

    PubMed 

    Google Scholar
     

  • 150.

    Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 151.

    Culbreth, A. J., Westbrook, A., Daw, N. D., Botvinick, M. & Barch, D. M. Reduced model-based decision-making in schizophrenia. J. Abnorm. Psychol. 125, 777–787 (2016).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • 152.

    Patzelt, E. H., Kool, W., Millner, A. J. & Gershman, S. J. Incentives boost model-based control across a range of severity on several psychiatric constructs. Biol. Psychiatry 85, 425–433 (2019).

    PubMed 

    Google Scholar
     

  • 153.

    Skinner, B. F. The Selection of Behavior: The Operant Behaviorism of BF Skinner: Comments and Consequences (CUP Archive, 1988).

  • 154.

    Corbit, L. H., Muir, J. L. & Balleine, B. W. Lesions of mediodorsal thalamus and anterior thalamic nuclei produce dissociable effects on instrumental conditioning in rats. Eur. J. Neurosci. 18, 1286–1294 (2003).

    PubMed 

    Google Scholar
     

  • 155.

    Coutureau, E. & Killcross, S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167–174 (2003).

    PubMed 

    Google Scholar
     

  • 156.

    Yin, H. H., Knowlton, B. J. & Balleine, B. W. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181–189 (2004).

    PubMed 

    Google Scholar
     

  • 157.

    Yin, H. H., Knowlton, B. J. & Balleine, B. W. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action–outcome contingency in instrumental conditioning. Behav. Brain Res. 166, 189–196 (2006).

    PubMed 

    Google Scholar
     

  • 158.

    Ito, M. & Doya, K. Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed-and free-choice tasks. J. Neurosci. 35, 3499–3514 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *