notes: search algorithms

Search algorithms are methods of finding a certain node in a graph. They require each node data structure to keep track of the tree:

state
parent
action
path-cost

There also must be a data structure containing the frontier of nodes that are net to be explored, usually a queue.

Search algorithms need to account for redundant paths, including primarily cycles. Graph search algorithms account for this, tree like searches do not. Finding all redundant paths is generally expensive, so usually only cycles are worried.

Search algorithms can be evaluated by:

completeness
cost optimality
time complexity
space complexity

Problems have depth and branching factor which impact the importance of algorithmic time/space complexity.

Best-first search is a general search algorithm where the next node to be explored is that with the smallest value from an eval-function. New children are added to frontier when they are discovered, and are readded if a new lower cost path is them are discovered. Different evaluation functions can be used to create other search algorithms.

Uninformed Search Strategies

Breadth-first search
- best used when all actions have the same cost
- best used with FIFO queue unless there are additional heuristics in place
- always finds most efficient solution
Dijkstra’s algorithm, aka uniform-cost search
- best first search where eval is the cost path from root to current node
depth-first search
- goes deep first
- often poor for graph search since, work well in finite space tree environments
- is not cost optimal, may not find best solution
- has small memory requirements
- can be modified to be more memory efficient using backtracking search
- can also have a depth limit to cap complexity, or iteratively deepened
bidirectional search
- searches from the initial and the goal state simultaneously, hoping the tow of them will meet

Informed Search Strategies

Heuristic functions can give domain-specific hints to an algorithm.

greedy best first search
- expands the first node with the lowest h(n) value, so f(n) = h(n)
- complete for finite spaces, quality of heuristic function matters a lot
A* search
- f(n) = g(n) + h(n)
- g(n) is the path cost from the initial state to node n
- h(n) is the estimate cost of the shortest path from n to goal
- A* is search complete
- if the heuristic is admissible (never overestimates the cost to reach a goal), A* is cost optimal
- if a heuristic is consistent if each node n and successor n' generated by action a h(n) <= cost(n,a,n')+h(n')
  - all consistent heuristics are admissible
  - also when reaching a node it will always be optimal path
- without admissibility, A* may not be cost optimal
- it prunes nodes that won't be used in an optimal solution
weighted A* search
- f(n) = g(n) + W * h(n), heuristic is more important
- h(n) may overestimate but it may be more accurate
beam search limits the size of the frontier based on best f-score
iterative deepening A*
- the cutoff for this is f-cost
- if all nodes have different f-cost, then each deepening may only have 1 new node which is slow
- ends up exploring the same space repeatedly
recursive best-first search
- best-first search but with linear space, uses the f limit to keep track of the f-value of the best alternative path from any ancestor of current node
- allows it to backtrack without storing the entire alternate tree, only storing remembering the best f-score from that tree
- is optimal if h(n) is admissible
- time complexity is a hrd thing to explain
- ends up exploring the same space repeatedly
SMA*
- has limited memory for leaves, will drop worst f-value leaf as it expands
- only needs to regenerate tree when entire subtree when all other paths are look worse than the path that was forgotten
- when there is a f-value draw it it will drop the oldest
- is complete for all solutions within memory bounds, and for those it is optimal
- on hard problems it goes back and forth between small subset of candidates that can fit in memory
bidirectional heuristic search
- to prove optimal cost solutions, we must consider pairs of nodes, one from either frontier that can be proved to be surely expanded
- no guarantee of optimal efficiency, even after you have determined that a pair of nodes are both surely expanded, since you do not know which of their children to follow

all nodes that can be reached from the initial state on a path where every node on the path has f(n) < C*. We say these are surely expanded nodes

AIMA Chapter 3, pg. 90

Heuristics

An important characteristic is the effective branching factor (b) of a heuristic. Ideally b is close to 1.