COSC 320 – 001
Analysis of Algorithms
2020 Winter Term 2
Project Topic Number: 5
Analysis of the algorithm: An Informed Search Strategy
Before we talk about the pseudo-code, we form the question almost same as the we
did in the first milestone, specifically:
States: Each state is basically representing a city on the map.
Initial state: Whatever the user decides to be.
Actions: A set of flights originating from the current city. For each flight, we have
flight fee, flight time, and waiting time.
Transition Model: (Referring to the definition) The action of taking P from A
would result in B.
Goal Test: If we are at the final location (the destination).
Path Cost: The result would be a set of actions, namely p = {P1, P2, …} that would
transfer the initial state to the goal state. Hence, the Path Cost would be the total
financial cost, waiting time between flights, and flight time. Different from Last
time, the Path Cost (g(n)) would work with a consistent heuristic function,
namely h(n), to decide which node to expand.
We call the function??(??) = ??(??) + ??(??): the estimated cost of cheapest path
cost from initial state to the goal state at node n.
A Consistent Heuristic:
We would use the same an artificial dataset that contains random statistics as the
previous milestone is, but we would add another matrix, namely D, to provide the extra
information to derive the h(n). In particular, the matrix defines the real distances
between any two cities.
???, ?? ∈ ??, ????,?? ?????????????? ????? ???????? ???????????????? ?????????????? ???????? ??, ??
????????? ?? ???? ????? ?????? ???? ?????? ????????????.
Pseudo-code
This time, we approach the problem using A* Search. Similar to Uniform Cost
Search, we use a priority queue to help us decide the path. This time, we use the
estimation g(n) instead of path cost.
define node with the state as problem’s initial state, path cost zero //initialization
define PriorityQueue<Node> frontier to store all the nodes and sort by the estimated
cost
define Set explored //to be filled
while(true)
if frontier is empty then return failure //no solution
pop frontier and set node to it //now node is the lowest cost in frontier
if node is at goal state then return solution //there is optimal solution
add the state of node to the explored list
now iterate through all available actions at node
define child as the successor we are looking at in this iteration
if child’s state is not in frontier or explored then add it in to the frontier
else if child’s state is in frontier and child has a lower estimated cost// notice
that we use an “estimated cost” instead of “path cost” here
then update the frontier with the child //to get optimal solution
Proof of Correctness:
A* Search provides an optimal solution if such solution exists.
But, before we prove the correctness, let’s prove that the heuristic function h(n) is
consistent. In other words, the cost of any node n to its predecessor, namely n’, plus
its estimated cost is always greater or equal to n’s estimated cost.
?(??) ≤ ??(??, ????,??′
, ??
′
) + ?(??′)
Proof:
Recall that we define the path cost to be a sum of three terms:
??(??, ????,??′
, ??
′
) = ??1 ? ??1 + ??2 ? ??2 + ??3 ? ??3????????? ??1 ≥ 1 , ??2, ??3 ≥ 0 &
??1, ??2, ??3 ???????????????????? ????? ????????????????, ??????????? ???????? ?????? ?????????????? ????????
From this, we can derive that:
??1
(??, ??′) ≤ ??1 ? ??1 + ??2 ? ??2 + ??3 ? ??3 = ??(??, ????,??
′ , ??
′
) (??0)
Now, according to the triangular inequality:
?(??) ≤ ??1
(??, ??′) + ?(??
′
)
Take in Statement 0 yields this:
?(??) ≤ ??(??, ????,??
′ , ??
′
) + ?(??
′
)
Lastly, add the g(n) on both sides, we have completed the proof:
??(??
′
) = ??(??
′
) + ?(??) = ??(??) + ??(??, ????,??′
, ??
′
) + ?(??
′
) ≥ ??(??) + ?(??)
Now that we have proved the heuristic function is consistent, we have basically
derived that “the values of estimations are always increasing along the path). See this:
??? ∈ ?? ????????? ?? ????? ?? ?????????????????? ???????????? ??′, ??(??
′
) ≥ ??(??) + ?(??)
Now let’s prove with induction that whenever the A* search expanded a node, it
expands the node with an optimal path. (Notice that this is not exactly same as
proving A* search yields the optimal solution. What we are going to prove contains
the statement.)
Base Case: The first node is always expanded with an optimal cost of 0.
Inductive Step: We argue by contradiction.
We assume the IH holds that all the nodes up to ??0 are expanded with an
optimal cost. However, it would be absurd if we assume that ??0′ is not expanded
with an optimal cost. Since there would have been another ??′ sitting on the
presumably more optimal path that would have been selected by the priority
queue in the first place.
Hence, ?????????????? ?????????? ~ ??0 ?????????? ?????????????? ??????? ?????????????? ??????
′??
????????????
→ ??0
′ ?????????? ?????????????? ??????? ?????????????? ??????′??
Conclusion: By the principle of Strong Induction, we have derived that all nodes
in A* search are expanded with optimal solution.
Hence, A* search provide optimal solution to a graph search problem.
Running Time: A* search's time complexity is determined by the heuristic. The
number of nodes extended in the worst case of an unbounded search space is
exponential in the depth of the solution (the shortest path) d: ??(??
??
), where b is the
branching factor (the average number of successors per state). This assumes that a
goal state exists at all, and is reachable from the start state; if it is not, and the state
space is infinite, the algorithm will not terminate.
The time complexity is polynomial when the search space is a tree, there is a single
goal state, and the heuristic function h meets the following condition:
where h* is the optimal heuristic, the exact cost to get from x to the goal. In other
words, the error of h will not grow faster than the logarithm of the "perfect
heuristic" h* that returns the true distance from x to the goal. [2]
Data Structure.
A priority queue is commonly used in A* implementations to perform the repeated
collection of minimum (estimated) cost nodes to extend. The open set, also known as
the fringe, is a priority queue. The node with the lowest f(x) value is removed from
the queue at each step of the algorithm, and its neighbors’ f and g values are modified
accordingly, and these neighbors are returned to the queue. The algorithm continues
until a target node is found (the node with the lowest F value of all fringe nodes).
Since h at the target is zero in a consistent heuristic, the F value of that goal is also the
cost of the shortest path. [3]
Unexpected Cases/Difficulties.
One of the difficulties is the dataset: how are we defining the path cost and estimated
cost matrix exactly? Same as what we’ve mentioned in the last milestone, when we
define a path cost P, we define it as x1*f1+ x2*f2+ x3*f3 where f is in the unit of CAD
or hours and x is empirically determined.
However, when we look at the heuristic function, we are measuring the distance
directly. It guarantees the function to be consistent, while it does not incorporate time
elements. One reason is that flight time and wait time estimations are untrivial to
derive when we try to look at two cities that do not have any available flight in
between. In fact, we cannot provide any coefficient k based on existing factors such
that: ?? > 1, ?? ∈ ? & ??(??) = ??(??) + ?? ? ?(??), ????????? ?? ? ?(??) ???? ????????????????????
We will look into this problem again when we implement the function. If it turns out
such k can be derived, we will modify the pseudo-code and provide the proof of
correctness again.
Task Separation and Responsibilities.
Jack Wang did the first draft of the Second Milestone including Data Structure and
Running Time.
Larry Gu revised the first draft, specifically, he:
rewrote the Analysis of the algorithm
provided explanations for the heuristic function
provided the Pseudo-code
rewrote the Proof of Correctness
rewrote the Unexpected Cases/Difficulties.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。