The objective is for the instrument to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy. 따라서 택배 업체, 대중 교통 https://mondaydirectory.com/listings13160469/automatisation-avanc%C3%A9e-un-aper%C3%A7u