Philip S. Thomas

Philip S. Thomas

Assistant Professor and co-director of the Autonomous Learning Lab

College of Information and Computer Sciences, University of Massachusetts Amherst

pthomas [at] cs [dot] umass [dot] edu

I study a branch of artificial intelligence (AI) called reinforcement learning (RL). I am currently co-directing the Autonomous Learning Lab (ALL) at UMass Amherst with Sridhar Mahadevan. Before that I worked as a postdoc for Emma Brunskill at CMU. I completed my Ph.D. in computer science at UMass Amherst in 2015, where Andrew Barto was my adviser. I completed my B.S. and M.S. in computer science at CWRU in 2008 and 2009, where Michael Branicky was my adviser. Before that, in high school, I was introduced to computer science and mentored by David Kosbie.

Publications

(Bolded titles indicate papers that I find most interesting.)

2017

  • P. S. Thomas, B. Castro da Silva, A. G. Barto, and E. Brunskill. On Ensuring that Intelligent Machines are Well-Behaved. arXiv:1708.05448, 2017. pdf, arXiv
  • P. S. Thomas and E. Brunskill. Importance Sampling with Unequal Support. In Proceedings of the Thirty-First Conference on Artificial Intelligence, 2017. pdf, body only, supplemental only, ArXiv preprint (pdf)
  • P. S. Thomas., G. Theocharous, M. Ghavamzadeh, I. Durugkar, and E. Brunskill. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. In Conference on Innovative Applications of Artificial Intelligence, 2017. pdf
    • Related paper with same authors presented at the Workshop on Computational Frameworks for Personalization at ICML 2016.
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In 33rd Conference on Uncertainty in Artificial Intelligence, 2017. pdf
  • P. S. Thomas, C. Dann, and E. Brunskill. Decoupling Learning Rules from Representations. arXiv:1706.03100v1, 2017. pdf, arXiv
  • J. P. Hanna, P. S. Thomas, P. Stone, and S. Niekum. Data-Efficient Policy Evaluation Through Behavior Policy Search. In Proceedings of the Thirty-Fourth International Conference on Machine Learning, 2017. To appear.
  • Z. Guo, P. S. Thomas, and E. Brunskill. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation. In Advances in Neural Information Processing Systems, 2017.
    • Related paper with same authors, titled "Using Options for Long-Horizon Off-Policy Evaluation" was presented at The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017, as an extended abstract.
  • A. G. Barto, P. S. Thomas, and R. S. Sutton. Some Recent Applications of Reinforcement Learning. In Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017. pdf
  • P. S. Thomas and E. Brunskill. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines. arXiv:1706.06643v1, 2017. pdf, arXiv
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf
  • Y. Liu, P. S. Thomas, and E. Brunskill. Model Selection for Off-Policy Policy Evaluation. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf

2016

  • P. S. Thomas and E. Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix, ArXiv preprint (pdf)
    • Related extended abstract for Data-Efficient Machine Learning Workshop at ICML 2016. pdf
  • P. S. Thomas, B. C. da Silva, C. Dann, and E. Brunskill. Energetic Natural Gradient Descent. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix
  • M. G. Bellemare, G. Ostrovski, A. Guez, P. S. Thomas, and R. Munos. Increasing the Action Gap: New Operators for Reinforcement Learning. In Proceedings of the Thirtieth AAAI Conference, 2016. pdf, supplemental, video, code
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement. In IEEE Transactions on Human-Machine Systems 46(5) pages 723–733, October 2016. pdf
  • P. S. Thomas and E. Brunskill. Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality. In European Workshop On Reinforcement Learning, 2016. pdf

2015

  • P. S. Thomas. Safe Reinforcement Learning. PhD Thesis, School of Computer Science, University of Massachusetts Amherst, September 2015. pdf
  • P. S. Thomas, S. Niekum, G. Theocharous, and G. Konidaris. Policy Evaluation using the Ω-Return. In Advances in Neural Information Processing Systems 29, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Off-Policy Evaluation. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Policy Improvement. In Proceedings of the Thirty-Second International Conference on Machine Learning, 2015. pdf, errata
  • P. S. Thomas. A Notation for Markov Decision Processes. arXiv:1512.09075v1, 2015. pdf, arXiv
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence, 2015. pdf
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Ad recommendation systems for life-time value optimization. In TargetAd 2015: Ad Targeting at Scale, at the World Wide Web Conference, 2015. pdf

2014

  • P. S. Thomas. GeNGA: A generalization of natural gradient ascent with positive and negative convergence results. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • P. S. Thomas. Bias in natural actor-critic algorithms. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • W. Dabney and P. S. Thomas. Natural temporal difference learning. In Proceedings of the Twenty-Eighth Conference on Artificial Intelligence, 2014. pdf
  • S. Mahadevan, B. Liu, P. S. Thomas, W. Dabney, S. Giguere, N. Jacek, I. Gemp, J. Liu. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. arxiv:1405.6757v1, 2014. pdf, arXiv

2013

  • P. S. Thomas, W. Dabney, S. Mahadevan, and S. Giguere. Projected natural actor-critic. In Advances in Neural Information Processing Systems 26, 2013. pdf
  • W. Dabney, P. S. Thomas, and A. G. Barto. Performance Metrics for Reinforcement Learning Algorithms. In The First Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013. Extended abstract.

2012

  • P. S. Thomas. Bias in natural actor-critic algorithms. Technical Report UM-CS-2012-018, Department of Computer Science, University of Massachusetts Amherst, 2012. pdf
  • P. S. Thomas and A. G. Barto. Motor primitive discovery. In Proceedings of the IEEE Conference on Development and Learning and Epigenetic Robotics, 2012. pdf

2011

  • P. S. Thomas. Policy gradient coagent networks. In Advances in Neural Information Processing Systems 24, pages 1944–1952. 2011. pdf
  • G. D. Konidaris, S. Niekum, and P. S. Thomas. TDγ: Re-evaluating complex backups in temporal difference learning. In Advances in Neural Information Processing Systems 24, pages 2402–2410. 2011. pdf
      ↑Author names listed alphabetically. Footnote reads: "All three authors are primary authors on this occasion."
  • G. D. Konidaris, S. Osentoski, and P. S. Thomas. Value function approximation in reinforcement learning using the Fourier basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, pages 380–395, 2011. pdf
  • P. S. Thomas and A. G. Barto. Conjugate Markov decision processes. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, pages 137–144, 2011. pdf

2009

  • P. S. Thomas. A reinforcement learning controller for functional electrical stimulation of a human arm. Master's thesis, Department of Electrical Engineering and Computer Science, Case Western Reserve University, August 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence, pages 165–172, 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Creating a reinforcement learning controller for functional electrical stimulation of a human arm. In Proceedings of the Fourteenth Yale Workshop on Adaptive and Learning Systems, pages 15–20, 2008. pdf

1998

  • A. Kandabarow, M. Rafalko, and P. S. Thomas. Penguins with hats, penguins with pants. In 7th Grade English with Mrs. Haiges, Sewickley Academy, PA, c. 1998. pdf