# Assignment 1 Fast Trajectory Replanning solved

\$35.00

Category:

## Description

1 Setup
Consider the following problem: an agent in a gridworld has to move from its current cell to the given cell of a non-moving
target, where the gridworld is not fully known. They are discretizations of terrain into square cells that are either blocked
or unblocked.
Similar search challenges arise frequently in real-time computer games, such as Starcraft shown in Figure 1, and robotics.
To control characters in such games, the player can click on known or unknown terrain, and the game characters then move
autonomously to the location that the player clicked on. The characters observe the terrain within their limited field of
view and then remember it for future use but do not know the terrain initially (due to “fog of war”). The same situation
arises in robotics, where a mobile platform equipped with sensors builds a map of the world as it traverses an unknown
environment.
Figure 1: Fog of War in Starcraft
Assume that the initial cell of the agent is unblocked. The agent can move from its current cell in the four main compass
directions (east, south, west and north) to any adjacent cell, as long as that cell is unblocked and still part of the gridworld.
All moves take one time step for the agent and thus have cost one. The agent always knows which (unblocked) cell it is in
and which (unblocked) cell the target is in. The agent knows that blocked cells remain blocked and unblocked cells remain
unblocked but does not know initially which cells are blocked. However, it can always observe the blockage status of its
four adjacent cells, which corresponds to its field of view, and remember this information for future use. The objective of
the agent is to reach the target as effectively as possible.
A common-sense and tractable movement strategy for the agent is the following: The agent assumes that cells are unblocked
unless it has already observed them to be blocked and uses search with the “freespace assumption”. In other words, it moves
along a path that satisfies the following three properties:
1. It is a path from the current cell of the agent to the target.
2. It is a path that the agent does not know to be blocked and thus assumes to be unblocked, i.e., a presumed unblocked
path.
3. It is a shortest such path.
Whenever the agent observes additional blocked cells while it follows its current path, it remembers this information for
future use. If such cells block its current path, then its current path might no longer be a “shortest presumed-unblocked
path” from the current cell of the agent to the target. Then, the agent stops moving along its current path, searches for
another “shortest presumed-unblocked path” from its current cell to the target, taking into account the blocked cells that it
knows about, and then moves along this path. The cycle stops when the agent:
• either reaches the target or
• determines that it cannot reach the target because there is no presumed-unblocked path from its current cell to the
target and it is thus separated from the target by blocked cells.
In the former case, the agent reports that it reached the target. In the latter case, it reports that it cannot reach the target.
This movement strategy has two desirable properties:
1. The agent is guaranteed to reach the target if it is not separated from it by blocked cells.
2. The trajectory is provably short (but not necessarily optimal).
3. The trajectory is believable since the movement of the agent is directed toward the target and takes the blockage
status of all observed cells into account but not the blockage status of any unobserved cell.
1 2
D
C
B
A
3 4 5
E A T
1 2
D
C
B
A
3 4 5
E A T
Figure 2: First Example Search Problem (left) and Initial Knowledge of the Agent (right)
As an example, consider the gridworld of size 5 × 5 shown in Figure 2 (left). Black cells are blocked, and white cells are
unblocked. The initial cell of the agent is marked A, and the target is marked T. The initial knowledge of the agent about
blocked cells is shown in Figure 2 (right). The agent knows black cells to be blocked and white cells to be unblocked.
It does not know whether grey cells are blocked or unblocked. The trajectory of the agent is shown in Figure 3. The left
figures show the actual gridworld. The center figures show the knowledge of the agent about blocked cells. The right figures
again show the knowledge of the agent about blocked cells, except that all cells for which it does not know whether they
are blocked or unblocked are now shown in white since the agent assumes that they are unblocked. The arrows show the
“shortest presumed-unblocked paths” that the agent attempts to follow. The agent needs to find another “shortest presumedunblocked path” from its current cell to the target whenever it observes its current path to be blocked. The agent finds such
a path by finding a shortest path from its current cell to the target in the right figure. The resulting paths are shown in bold
directly after they were computed. For example, at time step 1, the agent searches for a “shortest presumed-unblocked path”
and then moves along it for three moves (first search). At time step 4, the agent searches for another “shortest presumedunblocked path” since it observed its current path to be blocked and then moves along it for one move (second search). At
time step 5, the agent searches for another “shortest presumed-unblocked path” (third search), and so on. When the agent
reaches the target it has observed the blockage status of every cell although this is not the case in general.
2 Modeling and Solving the Problem
The state space of the search problem arising is simple: The states correspond to the cells, and the actions allow the
agent to move from cell to cell. Initially, all action costs are one. When the agent observes a blocked cell for the first
time, it increases the action costs of all actions that enter or leave the corresponding state from one to infinity or, alternatively, removes the actions. A shortest path in this state space then is a “shortest presumed-unblocked path” in the gridworld.
Thus, the agent needs to search in state spaces in which action costs can increase or, alternatively, actions can be removed.
The agent searches for a shortest path in the state space whenever the length of its current path increases (to infinity). Thus,
the agent (of the typically many agents in real-time computer games) has to search repeatedly until it reaches the target. It is
therefore important for the searches to be as fast as possible to ensure that the agent responds quickly and moves smoothly
Time Step 1
1 2
D
C
B
A
3 4 5
E A T
1 2
D
C
B
A
3 4 5
E A T
3
1 2
D
C
B
A
3 4 5
E A T
Time Step 2
1 2
D
C
B
A
3 4 5
E A T
1 2
D
C
B
A
3 4 5
E A T
3
1 2
D
C
B
A
3 4 5
E A T
Time Step 3
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 4
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 5
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 6
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 7
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 8
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 9
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 10
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
3
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 11
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 12
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
1 2
D
C
B
A
3 4 5
E
A
T
Time Step 13
1 2
D
C
B
A
3 4 5
E A
1 2
D
C
B
A
3 4 5
E A
1 2
D
C
B
A
3 4 5
E A
Figure 3: Trajectory of the Agent for the First Example Search Problem
even on computers with slow processors in situations where the other components of real-time computer games (such
as the graphics and user interface) use most of the available processor cycles. Moore’s law does not solve this problem
since the number of game characters of real-time computer games will likely grow at least as quickly as the processor speed.
In the following, we use A* to determine the shortest paths, resulting in Repeated A*. A* can search either from the current
cell of the agent toward the target (= forward), resulting in Repeated Forward A*, or from the target toward the current cell
of the agent (= backward), resulting in Repeated Backward A*.
3 Repeated Forward A*
1 procedure ComputePath()
2 while g(sgoal) > mins0∈OPEN(g(s
0
) + h(s
0
))
3 remove a state s with the smallest f-value g(s) + h(s) from OPEN;
4 CLOSED := CLOSED ∪ {s};
5 for all actions a ∈ A(s)
6 if search(succ(s, a)) < counter 7 g(succ(s, a)) := ∞; 8 search(succ(s, a)) := counter; 9 if g(succ(s, a)) > g(s) + c(s, a)
10 g(succ(s, a)) := g(s) + c(s, a);
11 tree(succ(s, a)) := s;
12 if succ(s, a) is in OPEN then remove it from OPEN;
13 insert succ(s, a) into OPEN with f-value g(succ(s, a)) + h(succ(s, a));
14 procedure Main()
15 counter := 0;
16 for all states s ∈ S
17 search(s) := 0;
18 while sstart 6= sgoal
19 counter := counter + 1;
20 g(sstart) := 0;
21 search(sstart) := counter;
22 g(sgoal) := ∞;
23 search(sgoal) := counter;
24 OPEN := CLOSED := ∅;
25 insert sstart into OPEN with f-value g(sstart) + h(sstart);
26 ComputePath();
27 if OPEN = ∅
28 print “I cannot reach the target.”;
29 stop;
30 follow the tree-pointers from sgoal to sstart and then move the agent along the resulting path
from sstart to sgoal until it reaches sgoal or one or more action costs on the path increase;
31 set sstart to the current state of the agent (if it moved);
32 update the increased action costs (if any);
33 print “I reached the target.”;
34 stop;
Figure 4: Pseudocode of Repeated Forward A*
The pseudocode of Repeated Forward A* is shown in Figure 4. It performs the A* searches in ComputePath(). A* is
described in your textbook and therefore only briefly discussed in the following, using the following notation that can be
used to describe general search problems rather than only search problems in gridworlds: S denotes the finite set of states.
sstart ∈ S denotes the start state of the A* search (which is the current state of the agent), and sgoal ∈ S denotes the goal state
of the A* search (which is the state of the target). A(s) denotes the finite set of actions that can be executed in state s ∈ S.
c(s, a) > 0 denotes the action cost of executing action a ∈ A(s) in state s ∈ S, and succ(s, a) ∈ S denotes the resulting
successor state. A* maintains five values for all states s that it encounters:
1. a g-value g(s) (which is infinity initially), which is the length of the shortest path from the start state to state s found
by the A* search and thus an upper bound on the distance from the start state to state s;
2. an h-value (= heuristic) h(s) (which is user-supplied and does not change), which estimates the goal distance of state
s (= the distance from state s to the goal state);
3. an f-value f(s) := g(s) + h(s), which estimates the distance from the start state via state s to the goal state;
4. a tree-pointer tree(s) (which is undefined initially), which is necessary to identify a shortest path after the A* search;
5. and a search-value search(s), which is described below.
A* maintains an open list (a priority queue which contains only the start state initially). A* identifies a state s with the
smallest f-value in the open list [Line 2]. If the f-value of state s is no smaller than the g-value of the goal state, then
the A* search is over. Otherwise, A* removes state s from the open list [Line 3] and expands it. We say that it expands
state s when it inserts state s into the closed list (a set which is empty initially) [Line 4] and then performs the following
operations for all actions that can be executed in state s and result in a successor state whose g-value is larger than the
g-value of state s plus the action cost [Lines 5-13]: First, it sets the g-value of the successor state to the g-value of state s
plus the action cost [Line 10]. Second, it sets the tree-pointer of the successor state to (point to) state s [Line 11]. Finally,
it inserts the successor state into the open list or, if it was there already, changes its priority [Line 12-13]. (We say that it
generates a state when it inserts the state for the first time into the open list.) It then repeats the procedure.
Remember that h-values h(s) are consistent (= monotone) iff they satisfy the triangle inequalities h(sgoal) = 0 and
h(s) ≤ c(s, a) + h(succ(s, a)) for all states s with s 6= sgoal and all actions a that can be executed in state s. Consistent
h-values are admissible (= do not overestimate the goal distances). A* search with consistent h-values has the following
properties. Let g(s) and f(s) denote the g-values and f-values, respectively, after the A* search: First, the A* search
expands all states at most once each. Second, the g-values of all expanded states and the goal state after the A* search are
equal to the distances from start state to these states. Following the tree-pointers from these states to the start state identifies
shortest paths from the start state to these states in reverse. Third, the f-values of the series of expanded states over time
are monotonically nondecreasing. Thus, it holds that f(s) ≤ f(sgoal) = g(sgoal) for all states s that were expanded by the
A* search (that is, all states in the closed list) and g(sgoal) = f(sgoal) ≤ f(s) for all states s that were generated by the A*
search but remained unexpanded (that is, all states in the open list). Fourth, an A* search with consistent h-values h1(s)
expands no more states than an otherwise identical A* search with consistent h-values h2(s) for the same search problem
(except possibly for some states whose f-values are identical to the f-value of the goal state) if h1(s) ≥ h2(s) for all states s.
Repeated Forward A* itself executes ComputePath() to perform an A* search. Afterwards, it follows the tree-pointers
from the goal state to the start state to identify a shortest path from the start state to the goal state in reverse. Repeated
Forward A* then makes the agent move along this path until it reaches the target or action costs on the path increase [Line
30]. In the first case, the agent has reached the target. In the second case, the current path might no longer be a shortest
path from the current state of the agent to the state of the target. Repeated Forward A* then updates the current state of the
agent and repeats the procedure.
Repeated Forward A* does not initialize all g-values up front but uses the variables counter and search(s) to decide when to
initialize them. The value of counter is x during the x
th A* search, that is, the xth execution of ComputePath(). The value
of search(s) is x if state s was generated last by the x
th A* search (or is the goal state). The g-value of the goal state is
initialized at the beginning of an A* search [Line 22] since it is needed to test whether the A* search should terminate [Line
2]. The g-values of all other states are initialized directly before they might be inserted into the open list [Lines 7 and 20]