COIN-OR::LEMON - Graph Library

Opened 9 years ago

Last modified 4 years ago

#168 new task

Port bipartite matching algorithms

Reported by: alpar Owned by: alpar
Priority: critical Milestone: LEMON 1.4 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

This ticket is a follow-up of #48.

The following tools are affected:

  • lemon/bipartite_matching.h
  • lemon/pr_bipartite_matching.h (I really dislike this filename)
  • test/bipartite_matching_test.cc

It also depends on #69 (the port of the bipartite graphs).

Attachments (17)

bp_matching.h (34.1 KB) - added by elkebir 7 years ago.
bp_matching_b08198bdd613.patch (31.6 KB) - added by elkebir 7 years ago.
Maximum Weight Bipartite Matching (b08198bdd613)
bp_matching_test_751a90ada614.patch (4.0 KB) - added by elkebir 7 years ago.
Test cases for Maximum Weight Bipartite Matching (751a90ada614)
h_k_1a2daf1f6bf7.patch (17.5 KB) - added by poroszd 7 years ago.
Hopcroft-Karp bipartite matching
h_k_45d2e43f52ea.patch (17.3 KB) - added by poroszd 6 years ago.
In the previous patch the algorithm was erroneous, it is corrected.
hopcroft_karp-ec5a4ceb4481.patch (19.7 KB) - added by poroszd 6 years ago.
Correct Hopcroft-Karp
bp_benchmark_7a790fbf8e7d.patch (19.9 KB) - added by poroszd 6 years ago.
Bipartite matching benchmark tools
results.txt (15.5 KB) - added by poroszd 6 years ago.
Benchmark results
d6df00b92912.patch (19.4 KB) - added by poroszd 6 years ago.
Update weighted bp matching for typesafe node sets
hopcroft_karp_55711f25c446.patch (19.8 KB) - added by poroszd 6 years ago.
Hopcroft-Karp
bp_benchmark_ea93dc149092.patch (20.5 KB) - added by poroszd 6 years ago.
Benchmark tools
df198f12f7fa.patch (9.0 KB) - added by poroszd 6 years ago.
Min cost flow algorithms included in the benchmark
res2.txt (17.8 KB) - added by poroszd 6 years ago.
Benchmark result (extended with min cost flow)
bpm_upgrade.patch (81.2 KB) - added by poroszd 6 years ago.
Improved Hopcroft-Karp, an two additional
bpm_upgrade.2.patch (81.2 KB) - added by poroszd 6 years ago.
Improved Hopcroft-Karp; two preflow based matching
res3.txt (20.6 KB) - added by poroszd 6 years ago.
Benchmark results
bpm_pack.patch (124.6 KB) - added by poroszd 6 years ago.

Download all attachments as: .zip

Change History (52)

comment:1 Changed 9 years ago by alpar

  • Summary changed from Port bipartite matvhing algorithms to Port bipartite matching algorithms

comment:2 Changed 9 years ago by alpar

  • Milestone set to LEMON 1.2 release
  • Priority changed from major to critical
  • Type changed from defect to task

comment:3 in reply to: ↑ description ; follow-up: Changed 8 years ago by deba

Replying to alpar:

This ticket is a follow-up of #48.

The following tools are affected:

  • lemon/bipartite_matching.h
  • lemon/pr_bipartite_matching.h (I really dislike this filename)
  • test/bipartite_matching_test.cc

The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).

comment:4 in reply to: ↑ 3 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).

I don't mind. Having a slow implementation is better than having nothing. We can mention in the doc that the implementation is inefficient and may significantly change in the future.

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 8 years ago by kpeter

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

comment:6 in reply to: ↑ 5 ; follow-up: Changed 8 years ago by alpar

Replying to kpeter:

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

Yes, of course.

comment:7 Changed 8 years ago by alpar

  • Milestone changed from LEMON 1.2 release to LEMON 1.3 release

comment:8 in reply to: ↑ 6 Changed 7 years ago by deba

Replying to alpar:

Replying to kpeter:

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

Yes, of course.

No, if a SSP matching is necessary, then a new implementation should be written, because the current ones are extremely slow. I think, the current capacity scaling MCF is pretty fast, it would be necessary to add the same ideas to the matching algorithms (potential shift, stop on first found node...).

The max cardinality algorithms need really good names.

Changed 7 years ago by elkebir

comment:9 follow-up: Changed 7 years ago by elkebir

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 7 years ago by deba

Replying to elkebir:

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

I did not look at the code deeply, but could you make two changes:

  • The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
  • You can also create a patch file

comment:11 in reply to: ↑ 10 Changed 7 years ago by elkebir

Replying to deba:

Replying to elkebir:

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

I did not look at the code deeply, but could you make two changes:

  • The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
  • You can also create a patch file

Thanks. The BpGraph? classes are exactly what I needed. I will port my code to use them.

Changed 7 years ago by elkebir

Maximum Weight Bipartite Matching (b08198bdd613)

Changed 7 years ago by elkebir

Test cases for Maximum Weight Bipartite Matching (751a90ada614)

comment:12 Changed 7 years ago by elkebir

I attached a patch containing two implementations of Maximum Weight Bipartite Matching using the latest BpGraph? patch by deba. Also a patch containing test cases is attached. I made some improvements in the SSP matching algorithms, now the performance difference in dense graphs between the two methods is negligible.

Changed 7 years ago by poroszd

Hopcroft-Karp bipartite matching

comment:13 Changed 7 years ago by poroszd

I atteched a patch implementing the Hopcroft-Karp matching algoritm.
The patch uses the bipartite graph concept and the lgf_reader for bipartite graphs from ticket #69

Changed 6 years ago by poroszd

In the previous patch the algorithm was erroneous, it is corrected.

comment:14 Changed 6 years ago by poroszd

This patch ( [45d2e43f52ea] ) should replace the previous, that was totally incorrect.

comment:15 follow-up: Changed 6 years ago by deba

I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.

The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.

Changed 6 years ago by poroszd

Correct Hopcroft-Karp

comment:16 in reply to: ↑ 15 ; follow-up: Changed 6 years ago by poroszd

You are totally right, in the above attached patch it should be correct.

Replying to deba:

I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.

The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.

comment:17 in reply to: ↑ 16 ; follow-up: Changed 6 years ago by alpar

Replying to poroszd:

You are totally right, in the above attached patch it should be correct.

  • Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
  • Have you done any running time tests on how does this version performs compared to the previous one?

comment:18 in reply to: ↑ 17 Changed 6 years ago by poroszd

Replying to alpar:

Replying to poroszd:

You are totally right, in the above attached patch it should be correct.

  • Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.

I used the development repository, after importing the patches with the bpgraphs and the LGF reader.

  • Have you done any running time tests on how does this version performs compared to the previous one?

Well, this is roughly the version I used in the thesis (I forgot to commit here after noticing my error), so those results are valid.

comment:19 Changed 6 years ago by deba

Could you upload a simple benchmark for the matching algorithms?

  • Random bipartite graphs are fine with random costs
  • For unweighted case, I would like to see comparison with general matching and with the maxflow algorithm
  • For the weighted case, with the general weighted matching and the min cost flow algorithms
  • A text file, with a few test cases would be fine (including sparse and dense graphs)

comment:20 follow-up: Changed 6 years ago by kpeter

Notes for benchmarking weighted matchings:

  • You can also use the NETGEN generator to create problem instances easily. It is capable of generating problem instances for the so-called assignment problem, which is indeed a weighted bipartite matching. See e.g. http://en.wikipedia.org/wiki/Assignment_problem.
  • The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
  • I agree with Balazs that comparision with min cost flow algorithms would be important. I suggest you to try CapacityScaling, CostScaling and NetworkSimplex. I guess that CapacityScaling will be the fastest for this special case, but the matching algorithms should be even more efficient. :)

comment:21 in reply to: ↑ 20 Changed 6 years ago by alpar

Replying to kpeter:

The NETGEN generator is already integrated with benchmark tool. I wish all lemon related benchmark were done using that tool.

Changed 6 years ago by poroszd

Bipartite matching benchmark tools

Changed 6 years ago by poroszd

Benchmark results

comment:22 follow-up: Changed 6 years ago by poroszd

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

comment:23 follow-up: Changed 6 years ago by alpar

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Changed 6 years ago by poroszd

Update weighted bp matching for typesafe node sets

Changed 6 years ago by poroszd

Hopcroft-Karp

Changed 6 years ago by poroszd

Benchmark tools

comment:24 in reply to: ↑ 23 ; follow-up: Changed 6 years ago by poroszd

Replying to alpar:

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Sorry, I missed the changes regarding the typesafe red and blue nodes.
The above patches use them.

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

comment:25 in reply to: ↑ 22 ; follow-up: Changed 6 years ago by kpeter

Replying to poroszd:

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).

I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?

I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

There are more ways to do that. E.g.:

  • create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
  • add another extra node s to this digraph
  • add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
  • create a node map supply as follows:
    • supply[r] = 1 for each "red" node r
    • supply[b] = -1 for each "blue" node b
    • supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
  • create an arc map cost as follows:
    • cost[a] = -original_weight[a] for each "old" arc a
    • cost[a] = 0 for each new arc a
  • call a min cost flow algorithm as follows
      NetworkSimplex<Digraph> mcf;
      mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1));
      mcf.run();
    
  • obtain the matching by reading the flow values (mcf.flow(a)).

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.

Changed 6 years ago by poroszd

Min cost flow algorithms included in the benchmark

Changed 6 years ago by poroszd

Benchmark result (extended with min cost flow)

comment:26 in reply to: ↑ 25 Changed 6 years ago by poroszd

Replying to kpeter:

Replying to poroszd:

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).

I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?

I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

There are more ways to do that. E.g.:

  • create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
  • add another extra node s to this digraph
  • add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
  • create a node map supply as follows:
    • supply[r] = 1 for each "red" node r
    • supply[b] = -1 for each "blue" node b
    • supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
  • create an arc map cost as follows:
    • cost[a] = -original_weight[a] for each "old" arc a
    • cost[a] = 0 for each new arc a
  • call a min cost flow algorithm as follows
      NetworkSimplex<Digraph> mcf;
      mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1));
      mcf.run();
    
  • obtain the matching by reading the flow values (mcf.flow(a)).

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.

Thank you for the explanation, I added them.

I also uploaded some benchmark results again, but not with larger graphs, as they consume my RAM, and I think using swap will influence the result somehow oddly.
Regardless, I will upload them, but it takes some time.

comment:27 follow-up: Changed 6 years ago by kpeter

I agree that if you run out of physical RAM, your results won't be relevant.

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

On the other hand, the weighted bipartite algorithm is usually slower (by an order of magnitude) than the general matching algorithm as well as the network simplex and cost scaling min cost flow algorithms. (It's also interesting that in several cases, network simplex is the most efficient, especially for dense graphs.)

Do you have any insight about the modest performance of the weighted bp. matching? Do you have any idea to make it faster? Are you aware the bottleneck opperations of the algorithm?

comment:28 in reply to: ↑ 24 Changed 6 years ago by deba

Replying to poroszd:

Replying to alpar:

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Sorry, I missed the changes regarding the typesafe red and blue nodes.
The above patches use them.

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

Which compiler options did you use for benchmarking? Could you also add the used hardware (CPU + RAM)? It would be also good to add these parameters to the result files.

comment:29 in reply to: ↑ 27 ; follow-up: Changed 6 years ago by deba

Replying to kpeter:

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

I have few ideas to improve the HK algorithm:

  • We need a proof, that we found the optimal solution:
  • Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
  • I would not separate the red->blue and blue->red steps in the BFS.
  • I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
  • I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
  • I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).

I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.

Changed 6 years ago by poroszd

Improved Hopcroft-Karp, an two additional

Changed 6 years ago by poroszd

Improved Hopcroft-Karp; two preflow based matching

Changed 6 years ago by poroszd

Benchmark results

comment:30 in reply to: ↑ 29 ; follow-up: Changed 6 years ago by poroszd

Thank You, I updated the Hopcroft-Karp following your suggestions, and added two variants of preflow based matching.

I also found some bug when initializing an elevator with non-default size.
Could You check if my fix is correct?

Replying to deba:

Replying to kpeter:

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

I have few ideas to improve the HK algorithm:

  • We need a proof, that we found the optimal solution:
  • Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
  • I would not separate the red->blue and blue->red steps in the BFS.
  • I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
  • I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
  • I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).

I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.

comment:31 in reply to: ↑ 30 ; follow-up: Changed 6 years ago by poroszd

Uh, sorry, somehow I sent the patch twice...

comment:32 in reply to: ↑ 31 ; follow-up: Changed 6 years ago by alpar

Replying to poroszd:

Uh, sorry, somehow I sent the patch twice...

As usual, none of then I can apply to my repo:

$ hg -R /home/alpar/projects/LEMON/hg/lemon-bipartite import --exact /tmp/bpm_upgrade.2-1.patch
applying /tmp/bpm_upgrade.2-1.patch
abort: unknown revision '4928e7fa0a4062b96c9fada8abd40d3c7041bcae'!

Changed 6 years ago by poroszd

comment:33 in reply to: ↑ 32 ; follow-up: Changed 6 years ago by poroszd

Replying to alpar:

This patch can be applied to the repo You linked above. It contains:

comment:34 in reply to: ↑ 33 Changed 6 years ago by alpar


This patch can be applied to the repo You linked above. It contains:

It's much better now, thank you.

comment:35 Changed 4 years ago by deba

  • Milestone changed from LEMON 1.3 release to LEMON 1.4 release

I would move this to 1.4.

Note: See TracTickets for help on using tickets.