定義為lookahead的步數,定義為 base policy,base policy 採用步,是對 Cost Function Approximation 的近似,是透過Truncated Rollout 生成了 policy,那麼對 T
info/ Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions The difference is that look