Philip S. Thomas

Philip S. Thomas, PhD

Postdoctoral Fellow

School of Computer Science, Carnegie Mellon University

PThomasCS [at] gmail [dot] com

I study a branch of artificial intelligence (AI) called reinforcement learning (RL). I am currently working as a postdoc for Emma Brunskill at CMU. I completed my Ph.D. in computer science at UMass Amherst in 2015, where Andrew Barto was my adviser. I completed my B.S. and M.S. in computer science at CWRU in 2008 and 2009, where Michael Branicky was my adviser. Before that, in high school, I was introduced to computer science and mentored by David Kosbie.

Publications

2016

  • P. S. Thomas and E. Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix, ArXiv preprint (pdf)
    • Related extended abstract for Data-Efficient Machine Learning Workshop at ICML 2016. pdf
  • P. S. Thomas, B. C. da Silva, C. Dann, and E. Brunskill. Energetic Natural Gradient Descent. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix
  • M. G. Bellemare, G. Ostrovski, A. Guez, P. S. Thomas, and R. Munos. Increasing the Action Gap: New Operators for Reinforcement Learning. In Proceedings of the Thirtieth AAAI Conference, 2016. pdf, supplemental, video, code
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement. In IEEE Transactions on Human-Machine Systems 46(5) pages 723–733, October 2016. pdf

2015

  • P. S. Thomas. Safe Reinforcement Learning. PhD Thesis, School of Computer Science, University of Massachusetts Amherst, September 2015. pdf
  • P. S. Thomas, S. Niekum, G. Theocharous, and G. Konidaris. Policy Evaluation using the Ω-Return. In Advances in Neural Information Processing Systems 29, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Off-Policy Evaluation. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Policy Improvement. In Proceedings of the Thirty-Second International Conference on Machine Learning, 2015. pdf
  • P. S. Thomas. A Notation for Markov Decision Processes. arXiv:1512.09075v1, 2015. pdf, arXiv
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence, 2015. pdf
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Ad recommendation systems for life-time value optimization. In TargetAd 2015: Ad Targeting at Scale, at the World Wide Web Conference, 2015. pdf

2014

  • P. S. Thomas. GeNGA: A generalization of natural gradient ascent with positive and negative convergence results. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • P. S. Thomas. Bias in natural actor-critic algorithms. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • W. Dabney and P. S. Thomas. Natural temporal difference learning. In Proceedings of the Twenty-Eighth Conference on Artificial Intelligence, 2014. pdf
  • S. Mahadevan, B. Liu, P. S. Thomas, W. Dabney, S. Giguere, N. Jacek, I. Gemp, J. Liu. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. arxiv:1405.6757v1, 2014. pdf, arXiv

2013

  • P. S. Thomas, W. Dabney, S. Mahadevan, and S. Giguere. Projected natural actor-critic. In Advances in Neural Information Processing Systems 26, 2013. pdf

2012

  • P. S. Thomas. Bias in natural actor-critic algorithms. Technical Report UM-CS-2012-018, Department of Computer Science, University of Massachusetts Amherst, 2012. pdf
  • P. S. Thomas and A. G. Barto. Motor primitive discovery. In Proceedings of the IEEE Conference on Development and Learning and Epigenetic Robotics, 2012. pdf

2011

  • P. S. Thomas. Policy gradient coagent networks. In Advances in Neural Information Processing Systems 24, pages 1944–1952. 2011. pdf
  • G. D. Konidaris, S. Niekum, and P. S. Thomas. TDγ: Re-evaluating complex backups in temporal difference learning. In Advances in Neural Information Processing Systems 24, pages 2402–2410. 2011. pdf
      ↑Author names listed alphabetically. Footnote reads: "All three authors are primary authors on this occasion."
  • G. D. Konidaris, S. Osentoski, and P. S. Thomas. Value function approximation in reinforcement learning using the Fourier basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, pages 380–395, 2011. pdf
  • P. S. Thomas and A. G. Barto. Conjugate Markov decision processes. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, pages 137–144, 2011. pdf

2009

  • P. S. Thomas. A reinforcement learning controller for functional electrical stimulation of a human arm. Master's thesis, Department of Electrical Engineering and Computer Science, Case Western Reserve University, August 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence, pages 165–172, 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Creating a reinforcement learning controller for functional electrical stimulation of a human arm. In Proceedings of the Fourteenth Yale Workshop on Adaptive and Learning Systems, pages 15–20, 2008. pdf

1998

  • A. Kandabarow, M. Rafalko, and P. S. Thomas. Penguins with hats, penguins with pants. In 7th Grade English with Mrs. Haiges, Sewickley Academy, PA, c. 1998. pdf