I'm a bit late to the party this week due to midterms, but this problem is really interesting. As usual, it's an ACM practice problem which I failed to solve in time in the last practice stage, and I now provide the solution and an explanation of why it works.
This problem is from the 2013 Pacific Northwest Regionals. The problem statement is copy-pasted from the PDF on their website and is not my own. This is problem K, for reference.
War has broken out between two Klingon clans. But war between Klingons is not just any war; it must be both honorable and glorious. For honor, the two sides must be exactly matched. For glory, there must be as many participants as possible.
Each clan obeys a strict hierarchy: there is one leader of the entire clan. This leader may have zero or more direct subordinates who are ordered from eldest to youngest. Each subordinate in turn may have zero or more direct subordinates of his or her own, also ordered from eldest to youngest (and so on and so forth). By tradition, every clan warrior is younger than his or her superior. Furthermore, each individual clan warrior specializes in one of several distinct fighting styles.
The subclan commanded by a clan warrior consists of the warrior himself and all direct or indirect reports (i.e., subordinates, subordinates of subordinates, etc.). A pair of subclans are said to match exactly if two conditions are met. First, the leaders of each subclan must have the same fighting style and number of subordinates. Second, assuming that the direct subordinates of each leader are ordered by decreasing age, then the subclans commanded by the first direct subordinates must match exactly, the subclans commanded by the second direct subordinates must match exactly, and so on.
Each clan will select its participants in the war, consisting of a single warrior and his or her subclan. The two subclans chosen must match exactly, and must be as large as possible. How many warriors fight for each clan?
The size of each clan is between 1 and 10,000. There are 50 testcases. Total solution time must not exceed 10 seconds.
In the above example, we are looking for the largest matching subtree. Obviously this is ABC, so the solution should be 3 (the number of nodes in ABC).
The above controls allow generating arbitrary b-trees. Only 50 members of each clan are generated for esthetic reasons.
Each clan structure can be represented as a b-tree, and the problem reduces to finding the largest matching subtree, for their definition of 'matching'. Because the size of the clans is not very large, we can try a brute-force solution: match every subtree of clan M to every subtree of clan N (number of subtrees = number of nodes, because we must choose the full subtree).
This suggests pseudo-code something like the following:
// read m_list and n_list from input m_tree = make_btree(m_list) n_tree = make_btree(n_list) best = 0 for m_node in m_tree: for n_node in n_tree: if is_match(m_node, n_node): s = m_node.size() // size refers to size of full subtree if s > best: best = s print best
There are 2 issues with this approach:
- Match detection is computationally intensive, and the function hides a lot of complexity. Match detection is recursive, so the complexity of match detection is O(size of tree).
- Even with match detection O(1), 10,000 × 10,000 = 100,000,000. That's a big number, and depending on the constant factor, may not finish in half a second (total time / number of test cases = 0.5 seconds). We need some optimizations.
Because match detection is recursive, an immediate idea should be to memoize results as they go up the tree. Use a 2-dimensional array of bools (call it
matches) to keep track of whether subtree i (clan member i in tree M) is a match with subtree j (clan member j in tree N). Denote this result as
matches[i][j] = true.
As long as we fill in
matches from the bottom up, the matching code for any node will only iterate over its children in the matches array.
In addition to this, we can compute the number of total subordinates for each node in the same way, storing this result in the node struct itself.
Another optimization is the fact that we don't need to check every pair of nodes against each other, just those which appear to match. One way to do this is to use some crude matching criteria initially, and then do a full match if these criteria are satisfied.
My idea is to hash every node in the N-tree using a hash function which generates a unique value out of a combination of:
- Fighting style
- Total number of underlings
- Number of immediate children
Here is my hash function code:
typedef long long ll; const int MAX_M = 10000; return ((ll) children.size() * 26 * 2 * MAX_M) + ((ll) num_subordinates * 26) + (fightingStyle - 65);
Where 65 is the ASCII code for 'A'.
In this way, we simply hash each node in the M-tree, find the list of matches in the N-tree (which are pre-hashed), and iterate over these potential matches to do the recursive match computation. Combined with memoization, this should be much faster than MN.
The full solution can be found here. It builds a b-tree out of the input, pre-computes the size of each subtree, hashes the N-tree, and then does matching in the DP manner outlined above.