Friday, December 2, 2016

Search in Graph

                            Search in Dictionary 

   To search a specific target in a graph, which in other words to traversal a graph, we mainly have two ways to do it.  Namely, DFS and BFS. Both of them are well-know algorithms, if you do not know the basic idea of them, you can easily find a tutorial online regarding this two basic algorithms.

   Here are something I want to point out before we dig into some specific examples about them.

1. Because both algorithm will visit every node in a graph exactly once. So to traversal a whole graph, BFS and DFS have same time complexity, which is O(N).

2. In terms of optimal solution, BFS is guarantee to give you the optimal solution( the closest path), while DFS may give you a suboptimal solution.

3. BFS, in general, will use more memory(space complexity) than BFS, especially when brach factor of a graph become big.


               Example 1: Word Ladder  

Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
  1. Only one letter can be changed at a time
  2. Each intermediate word must exist in the word list
For example,
Given:
beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log"]
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.
Note:
  • Return 0 if there is no such transformation sequence.
  • All words have the same length.
  • All words contain only lowercase alphabetic characters.
-----------------------------------------------------------------------------------------------------------------------

Analyze:  A easy approach will be using dictionary to build a graph, where  two adjacent words have one difference in letter. Then we using BFS to find the shortest path from start word to end word, which will then give us its length.   
      However, under this approach,if we have a big size dictionary, we will waste tons of time to build the graph before we even start to search.  Besides, note that if the question asking you to return the length of path, in order to come up with a relatively efficient algorithm to solve this problem, you are gonna to only care about the length. What I mean by saying that is, if you go find the optimal path, which, then, will cost you more than you should, because we only care about its length, not exactly the path. 
     Hence, we  1. Not gonna build  a graph before we search. 
                        2. Not gonna remember the exactly path, while only care the distance. 

   It seems easy that we only will care about the distance, but how about not build a graph before we start to search ?  The solution will be we finding all possible next steps( the word have one difference with current word) of current step in dictionary then see if our target is in our next reachable words. So we actually only care about the part of graph we gonna step into. 

 Here is a solution based on those thoughts:  
  

      
 How about the getNext( s, dic ) method in Line 12 ? 

It has a interesting approach that try to change every position of this word into character from 'a' to 'z', then see do we have this word in our dictionary. 

 Here is the implementation: 


Notice Line 15, we remove visited word from dictionary to avoid re-visiting. 




         Example 2: Word Search  



Given a 2D board and a word, find if the word exists in the grid.
The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.
For example,
Given board =
[
  ['A','B','C','E'],
  ['S','F','C','S'],
  ['A','D','E','E']
]
word = "ABCCED", -> returns true,
word = "SEE", -> returns true,
word = "ABCB", -> returns false.

--------------------------------------------------------------------------------------------------------------------------

Analyze : Notice that 1. We do have many duplicate characters in dictionary. For example, two 'A' , three 'E' are exist in example board.

                2.  Re-visiting is not allowed. That is why word" ABCB" should return false .

 If we set a data structure to remember every 'A' position, every 'B' position ... We will, again wast lots of time before start and also many extra space is needed.

 Besides, in order to avoid re-visiting, creating a another board to record the position we have visited is totally fine, but we do have a better approach, which will not cost extra memory !

 How we gonna do that ?
       1 . No position memory -- > We try every position as a possible start point.
       2 . No step to trying find next valid move --> try all possible 4 position.
       3.  No visited[][] record board --> We use bit mask modify this visited pos make it invalid.

After those step to simplify our code, here is a very elegant solution: 


Do you find the code style in Line 18, and 25 somewhat similar ?  

Yes, you may say it is very similar to backtracking algorithm, where you remove locally added element from the list after recursive call return. 



 Example 3: Add Search Word - Data structure design 

Design a data structure that supports the following two operations:
void addWord(word)
bool search(word)
search(word) can search a literal word or a regular expression string containing only letters a-z or .. A .means it can represent any one letter.
For example:
addWord("bad")
addWord("dad")
addWord("mad")
search("pad") -> false
search("bad") -> true
search(".ad") -> true
search("b..") -> true
Note:
You may assume that all words are consist of lowercase letters a-z.

-----------------------------------------------------------------------------------------------------------------------

Analyze: If you do not know Trie, which also known as Prefix Tree, you should go and find a tutorial to learn what is a prefix tree and how to implement it.  Trie is great data structure that allow us to store and search words in a convenient way. 

   After we know about the prefix tree, the only question on this problem is how to search a word in prefix tree when the word have '.' in it.  

   This only problem could then solved by using a approach very close to DFS. 

Here is the Node class of prefix tree: 
   
 How we add words into this data structure: 

which is really the same thing we will doing to implement an ordinary Trie

Finally, How to search with '.'


Note that from Line 39, we start to deal with the '.' case in our search. The way is we trying to search through all possible words in our Trie from that point and recursive call enable us to do it  with very concise code.


Hope it helps you deal with some search problems. 



Enjoy Coding ^^
Ye Jiang




No comments:

Post a Comment