Find Connected Components in Directed/Undirected Graph

Posted on August 16, 2015 by Ying

Undirected Graph

1. BFS

2. Union Find

Directed Graph

1. Tarjan

https://www.byvoid.com/blog/scc-tarjan

Big Table

Posted on August 14, 2015 by Ying

目的 efficient query

b Tree Vs b+ tree

Posted on August 13, 2015 by Ying

http://stackoverflow.com/questions/870218/b-trees-b-trees-difference

Partition Array

Posted on August 11, 2015 by Ying

1. with target outside the array

    public int partitionArray(int[] nums, int k) {
        if (nums == null || nums.length == 0) {
            return 0;
        }
        int left = 0;
        int right = nums.length - 1;
        while (left <= right) {
            while (left <= right && nums[left] < k) {
                left++;
            }
            while (left <= right && nums[right] >= k) {
                right--;
            }
            if (left < right) {
                swap(nums, left, right);
                left++;
                right--;
            }
        }
        return left;
    }

    public void swap(int[] nums, int a, int b) {
        int tmp = nums[a];
        nums[a] = nums[b];
        nums[b] = tmp;
    }

2. with target with in the array(Quick select)

The Google File System 论文笔记

Posted on August 11, 2015 by Ying

The Google File System

Dynamo: Amazon’s Highly Available Key-value Store 论文笔记

Posted on August 10, 2015 by Ying

Dynamo: Amazon’s Highly Available Key-value Store

System Assumptions and Requirements in this case
1. High write availability (this is based on their use cases like shopping carts, user should be able to update the shopping carts anytime). So the design is also writable and resolve conflicts when read.
2. Query model is simple read and write operations to a data item which is uniquely identified by unique keys. No need for relational schemas. (Which is also based on the observation of some Amazon’s services.)
3. ACID(Atomicity, Consistency, Isolation, Durability) are not strictly followed since it targets applications that tolerant weaker consistency, which is called eventually consistency.
Design Considerations
1. When to resolve update conflicts? Read or Write?
  1. Since it focus on high write availability, so it pushes conflict resolution to reads (which unlike many traditional DBs which execute conflict resolution during writes and has simple policy for reads)
2. Who to resolve the conflicts? The data store or application?
  1. The application is responsible to resolve conflict updates. Since data store only has simple police like “last write wins” to resolve conflicts while application has more knowledge of each different situations and could have different strategy to resolve conflicts.
3. Incremental scalability
  1. Add/Delete one node at a time without having a huge impact on both read/writes of the system.
4. Symmetry
  1. No outstanding nodes. Each node should have the same responsibilities as its peers.

Architecture

Problem	Technique	Advantage
Partitioning	Consistent Hashing	Incremental Scalability
High Availability for writes	Vector clocks with reconciliation during reads	Version size is decoupled from update rates.
Handling temporary failures	Sloppy Quorum and hinted handoff	Provides high availability and durability guarantee when some of the replicas are not available.
Recovering from permanent failures	Anti-entropy using Merkle trees	Synchronizes divergent replicas in the background.
Membership and failure detection	Gossip-based membership protocol and failure detection.	Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information.

Partitioning (Consistent Hashing)
1. Both node and key are mapped to the same hash space (eg. 00~FF)
2. Key K is stored in B, which means B is responsible for K
3. Pros:
  1. Load balance (each node would get roughly similar number of keys)
  2. Scalability (add/delete one nodes, only its neighbors would be affected)
Replication
1. Dynamo is setup, N is assigned as a parameter indicating each data item is replicated on N nodes.
2. Each key contains a list of nodes which is responsible for its read/write operation. Which is called Preference List. Length of the preference list should be larger than N just in case nodes failures.
3. Using the consistent hashing, each node finds its coordinator, who is responsible to replicate the data to N-1 clockwise successor nodes.
Versioning
1. Vector Clock is used to show if there are update conflicts. Mainly used in key-value storage which doesn’t have locks for writes to pursue better performance.
2. D5([Sx, 3],[Sy, 1],[Sz,1]) means data item 5 which was updated by Sx 3 times, Sy 1 time, Sz 1time. Using the vector, it is easily to find out if two different version are parallel.
3. When reads the data, the vector clock is also included in the data item.
4. Deep understanding and examples, please check here
5. Cons: Vector Clock some times could be too long if there are many different servers involved in writes. But in real cases it should not happen since writes are generally handled by top N nodes in the preference list of that key. Even if it happens, we can have a upper bound size of the vector clock and get rid of the old vectors depending on the timestamp, which might potentially cause problems when trying to resolve conflicts.
Get() & Put() operation
1. Only first N healthy nodes in the preference list are involved. (those are down and inaccessible are skipped)
2. W + R > N (W/R: number of nodes which should success for writes/reads)
3. When put(), the coordinator generates the vector clock with the new version and writes the new version locally. Then replicates the new version to first N reachable node in the preference list. Consider write successful as long as there is W-1 nodes respond.
4. Similarly, for get(), the coordinates request the data from first N reachable nodes from the preference list and as long as there are R-1 response it will then returns all version of the data.
Failure handling (Hinted Handoff)
1. Check the Dynamo ring above, if node A is down, the data item which is supposed to written to A is now written to D (suppose N=3) along with the metadata (indicating which node it is supposed to be at) which is stored separately in D
2. Once such hint is discovered, and A is recovered, D will send the replica to A and then delete the replica from itself.
3. Hinted Handoff ensures read/writes won’t be rejected due to single node down or network failure.
Recovering from permanent failures、Membership and failure detection待进一步整理。

Reference: http://blog.ddup.us/2011/11/07/amazon-dynamo/

Design a Netflix

Posted on August 10, 2015 by Ying

Tiny Url

Posted on August 10, 2015 by Ying

Longest Palindromic Substring

Posted on August 9, 2015 by Ying

Longest Palindromic Substring

Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Example

Given the string = "abcdzdcab", return "cdzdc".

Challenge

O(n²) time is acceptable. Can you do it in O(n) time.

Solution1. DP O(n^2)

定义函数
P[i,j] = 字符串区间[i,j]是否为palindrome.

首先找个例子，比如S=”abccb”,
S= a b c c b
Index = 0 1 2 3 4

P[0,0] =1 //each char is a palindrome
P[0,1] =S[0] == S[1] , P[1,1] =1
P[0,2] = S[0] == S[2] && P[1,1], P[1,2] = S[1] == S[2] , P[2,2] = 1
P[0,3] = S[0] == S[3] && P[1,2], P[1,3] = S[1] == S[3] && P[2,2] , P[2,3] =S[2] ==S[3], P[3,3]=1
………………….
由此就可以推导出规律

P[i,j] = 1 if i ==j
= S[i] ==S[j] if j = i+1
= S[i] == S[j] && P[i+1][j-1] if j>i+1

实现如下：

public String longestPalindrome(String s) {
    if (s == null) {
        return "";
    }
    String res = "";
    int n = s.length();
    boolean[][] dp = new boolean[n][n];
    for (int i = 0; i < n; i++) {
        for (int j = 0; j <= i; j++) {
            if (i - j <= 2) {
                dp[j][i] = s.charAt(i) == s.charAt(j);
            } else {
                dp[j][i] = s.charAt(i) == s.charAt(j) && dp[j + 1][i - 1];
            }
            if (dp[j][i]) {
                if (i - j + 1 > res.length()) {
                    res = s.substring(j, i + 1);
                }
            }
        }
    }
    return res;
}

Solution2: Check both aba and abba 2 cases. O(n^2)

public String longestPalindrome(String s) {
    if (s == null || s.length() == 0) {
        return "";
    }
    String result = "";

    //1. if it is like 'aba'
    for (int i = 0; i < s.length(); i++) {
        int count = 0;
        while (i - count >= 0 && i + count < s.length() && s.charAt(i - count) == s.charAt(i + count)) {
            count++;
        }
        String palindrome = s.substring(i - count + 1, i + count);
        if (palindrome.length() > result.length()) {
            result = palindrome;
        }
    }

    //2. if it is like 'abba', the pivot would be the interval between i and i+1
    for (int i = 0; i < s.length() - 1; i++) {
        int count = 1;
        while (i - count + 1 >= 0 && i + count < s.length() && s.charAt(i - count + 1) == s.charAt(i + count)) {
            count++;
        }
        if (count > 1) {
            String palindrome = s.substring(i - count + 2, i + count);
            if (palindrome.length() > result.length()) {
                result = palindrome;
            }
        }
    }

    return result;
}

Longest Common Prefix

Posted on August 9, 2015 by Ying

Longest Common Prefix

Given k strings, find the longest common prefix (LCP).

Example

For strings "ABCD", "ABEF" and "ACEF", the LCP is "A"

For strings "ABCDEFG", "ABCEFG" and "ABCEFA", the LCP is "ABC"

public String longestCommonPrefix(String[] strs) {
    if (strs == null || strs.length == 0) {
        return "";
    }
    String lcp = strs[0];
    for (int i = 1; i < strs.length; i++) {
        lcp = getLCP(lcp, strs[i]);
    }
    return lcp;
}

public String getLCP(String s1, String s2) {
    StringBuilder sb = new StringBuilder();
    int n = Math.min(s1.length(), s2. length());
    for (int i = 0; i < n; i++) {
        if (s1.charAt(i) == s2.charAt(i)) {
            sb.append(s1.charAt(i));
        } else {
            break;
        }
    }
    return sb.toString();
}

M	T	W	T	F	S	S
« Jul				Sep »
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Now Now Now

내일은 없어

Monthly Archives: August 2015

Find Connected Components in Directed/Undirected Graph

Big Table

b Tree Vs b+ tree

Partition Array

The Google File System 论文笔记

Dynamo: Amazon’s Highly Available Key-value Store 论文笔记

System Assumptions and Requirements in this case

Design Considerations

Architecture

Design a Netflix

Tiny Url

Longest Palindromic Substring

Longest Palindromic Substring

Longest Common Prefix

Longest Common Prefix