As I've written about before, I'm on this year's U of T ICPC team. Every week, we go through a past contest and try to get the highest score possible under the same time constraints as the original contest.

When we start working on a problem but fail to solve it in the confines of the time limit, I like to write up the solutions and explain the thought process behind them.

This week's problem deals with graphs, and clever ways to represent them. It's interesting because it employs a few different ideas from graph theory together, which is a bit unusual.

The problem itself comes from the 2014 Nordic contest, and is problem A.

## Problem Statement

This is copy-pasted from the NCPC 2014 problem description, and are not my own words.

AMANDA AIR has routes between many different airports, and has asked their most important frequent flyers, members of the AA Frequent Flyer program, which routes they most often fly. Based on this survey, Amanda, the CEO and owner, has concluded that AMANDA AIR will place lounges at some of the airports at which they operate.

However, since there are so many routes going between a wide variety of airports, she has hired you to determine how many lounges she needs to build, if at all possible, given the constraints set by her. This calculation is to be provided by you, before any lounges are built. Her requirements specifies that for some routes, there must be lounges at both airports, for other routes, there must be lounges at exactly one of the airports, and for some routes, there will be no lounges at the airports.

She is very economically minded and is demanding the absolute minimum number of lounges to be built.

There are 1-200,000 airports, and 1-200000 routes.

## Analysis

Represent the given information as a graph, where the nodes are airports and the routes are edges. The first important thing to observe is that there are 200,000 nodes in this graph. This means that my preferred method of representing graphs - an adjacency matrix - will simply not work, because you can't fit 4×10^{10} integers/bools into memory. Therefore, we represent the matrix as an adjecency list.

The problem itself is a simple CSP (constraint-satisfaction) problem, so it's definitely possible to solve with the generalized-arc-consistency class of algorithms + backtrack search. The main problem with that approach is it seems needlessly complicated, especially if you don't have those algorithms in your codebook. Let's do something else!

Notice that if the value of an edge is 0 or 2, this determines whether each of the endpoints have a lounge or not. A value of 1 represents uncertainty, and we'll deal with that later.

Let's first split the graph into connected components, and consider a single connected component. If this component has a single vertex for which it is known whether it has a lounge or not, we can easily DFS on the adjacent vertices and determine them. We continue until the entire component is either filled in, or we have reached a contradiction (recall that the requirements may be impossible to satisfy).

The trick here is that we have an adjacency list, so no information about the values of the edges is stored. I modify the adjacency list to store pairs of elements - the index of the adjacent element, as well as the value of the edge.

Now that we have determined all the connected components which contain a determined vertex, let's determine the remaining ones. These remaining components have the property that each edge has a value of 1. This means that each vertex either has an airport or it doesn't.

We could do a search by setting every vertex in a component to have the airport, get the value, and find the minimum, but that would be wasteful. Instead, notice that this is equivalent to the 2-coloring problem of a graph. Color each vertex in a DFS manner either 'red' or 'black'. Then for this connected component, if we can achieve a consistent 2-coloring, we can just count the number of black vertices and the number of red vertices and take the min.

## Solution

Here is my solution on github. I made the observation that rather than keeping track of the connected components explicitly, I can just reuse the visited bitset from the DFS. The time complexity is something like O(m + n) (where m is the number of routes and n is the number of airports), as for each edge we either check the entire connected component, or just the edge itself. If we check the connected component, then it is never checked again.