Opened 10 years ago
Last modified 5 years ago
#168 new task
Port bipartite matching algorithms
Reported by: | alpar | Owned by: | alpar |
---|---|---|---|
Priority: | critical | Milestone: | LEMON 1.4 release |
Component: | core | Version: | hg main |
Keywords: | Cc: | ||
Revision id: |
Attachments (17)
Change History (52)
comment:1 Changed 10 years ago by alpar
- Summary changed from Port bipartite matvhing algorithms to Port bipartite matching algorithms
comment:2 Changed 10 years ago by alpar
- Milestone set to LEMON 1.2 release
- Priority changed from major to critical
- Type changed from defect to task
comment:3 in reply to: ↑ description ; follow-up: ↓ 4 Changed 9 years ago by deba
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 9 years ago by alpar
Replying to deba:
The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).
I don't mind. Having a slow implementation is better than having nothing. We can mention in the doc that the implementation is inefficient and may significantly change in the future.
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 9 years ago by kpeter
Replying to alpar:
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
You mean "worth keeping", do you?
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 8 Changed 9 years ago by alpar
comment:7 Changed 9 years ago by alpar
- Milestone changed from LEMON 1.2 release to LEMON 1.3 release
comment:8 in reply to: ↑ 6 Changed 8 years ago by deba
Replying to alpar:
Replying to kpeter:
Replying to alpar:
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
You mean "worth keeping", do you?
Yes, of course.
No, if a SSP matching is necessary, then a new implementation should be written, because the current ones are extremely slow. I think, the current capacity scaling MCF is pretty fast, it would be necessary to add the same ideas to the matching algorithms (potential shift, stop on first found node...).
The max cardinality algorithms need really good names.
Changed 8 years ago by elkebir
comment:9 follow-up: ↓ 10 Changed 8 years ago by elkebir
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 8 years ago by deba
Replying to elkebir:
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
I did not look at the code deeply, but could you make two changes:
- The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
- First you need a development version (hg clone http://lemon.cs.elte.hu/hg/lemon)
- Than you can patch in the bpgraphs.patch (hg import bpgraphs.patch)
- You can also create a patch file
- http://lemon.cs.elte.hu/trac/lemon/wiki/CommitGuides
- Add your new file (hg add lemon/bp_matching.h)
- Modify lemon/Makefile.am
- Commit your files (hg commit)
- Create a patch file (hg export -r tip >bp_matching.patch)
comment:11 in reply to: ↑ 10 Changed 8 years ago by elkebir
Replying to deba:
Replying to elkebir:
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found.
The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
I did not look at the code deeply, but could you make two changes:
- The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
- First you need a development version (hg clone http://lemon.cs.elte.hu/hg/lemon)
- Than you can patch in the bpgraphs.patch (hg import bpgraphs.patch)
- You can also create a patch file
- http://lemon.cs.elte.hu/trac/lemon/wiki/CommitGuides
- Add your new file (hg add lemon/bp_matching.h)
- Modify lemon/Makefile.am
- Commit your files (hg commit)
- Create a patch file (hg export -r tip >bp_matching.patch)
Thanks. The BpGraph? classes are exactly what I needed. I will port my code to use them.
comment:12 Changed 8 years ago by elkebir
I attached a patch containing two implementations of Maximum Weight Bipartite Matching using the latest BpGraph? patch by deba. Also a patch containing test cases is attached. I made some improvements in the SSP matching algorithms, now the performance difference in dense graphs between the two methods is negligible.
comment:13 Changed 8 years ago by poroszd
I atteched a patch implementing the Hopcroft-Karp matching algoritm.
The patch uses the bipartite graph concept and the lgf_reader for bipartite graphs from ticket #69
comment:14 Changed 7 years ago by poroszd
This patch ( [45d2e43f52ea] ) should replace the previous, that was totally incorrect.
comment:15 follow-up: ↓ 16 Changed 7 years ago by deba
I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.
The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.
comment:16 in reply to: ↑ 15 ; follow-up: ↓ 17 Changed 7 years ago by poroszd
You are totally right, in the above attached patch it should be correct.
Replying to deba:
I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.
The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.
comment:17 in reply to: ↑ 16 ; follow-up: ↓ 18 Changed 7 years ago by alpar
Replying to poroszd:
You are totally right, in the above attached patch it should be correct.
- Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
- Have you done any running time tests on how does this version performs compared to the previous one?
comment:18 in reply to: ↑ 17 Changed 7 years ago by poroszd
Replying to alpar:
Replying to poroszd:
You are totally right, in the above attached patch it should be correct.
- Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
I used the development repository, after importing the patches with the bpgraphs and the LGF reader.
- Have you done any running time tests on how does this version performs compared to the previous one?
Well, this is roughly the version I used in the thesis (I forgot to commit here after noticing my error), so those results are valid.
comment:19 Changed 7 years ago by deba
Could you upload a simple benchmark for the matching algorithms?
- Random bipartite graphs are fine with random costs
- For unweighted case, I would like to see comparison with general matching and with the maxflow algorithm
- For the weighted case, with the general weighted matching and the min cost flow algorithms
- A text file, with a few test cases would be fine (including sparse and dense graphs)
comment:20 follow-up: ↓ 21 Changed 7 years ago by kpeter
Notes for benchmarking weighted matchings:
- You can also use the NETGEN generator to create problem instances easily. It is capable of generating problem instances for the so-called assignment problem, which is indeed a weighted bipartite matching. See e.g. http://en.wikipedia.org/wiki/Assignment_problem.
- The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
- I agree with Balazs that comparision with min cost flow algorithms would be important. I suggest you to try CapacityScaling, CostScaling and NetworkSimplex. I guess that CapacityScaling will be the fastest for this special case, but the matching algorithms should be even more efficient. :)
comment:21 in reply to: ↑ 20 Changed 7 years ago by alpar
Replying to kpeter:
- The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
The NETGEN generator is already integrated with benchmark tool. I wish all lemon related benchmark were done using that tool.
comment:22 follow-up: ↓ 25 Changed 7 years ago by poroszd
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
comment:23 follow-up: ↓ 24 Changed 7 years ago by alpar
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
comment:24 in reply to: ↑ 23 ; follow-up: ↓ 28 Changed 7 years ago by poroszd
Replying to alpar:
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
Sorry, I missed the changes regarding the typesafe red and blue nodes.
The above patches use them.
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
comment:25 in reply to: ↑ 22 ; follow-up: ↓ 26 Changed 7 years ago by kpeter
Replying to poroszd:
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
There are more ways to do that. E.g.:
- create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
- add another extra node s to this digraph
- add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
- create a node map supply as follows:
- supply[r] = 1 for each "red" node r
- supply[b] = -1 for each "blue" node b
- supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
- create an arc map cost as follows:
- cost[a] = -original_weight[a] for each "old" arc a
- cost[a] = 0 for each new arc a
- call a min cost flow algorithm as follows
NetworkSimplex<Digraph> mcf; mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1)); mcf.run();
- obtain the matching by reading the flow values (mcf.flow(a)).
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.
comment:26 in reply to: ↑ 25 Changed 7 years ago by poroszd
Replying to kpeter:
Replying to poroszd:
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
There are more ways to do that. E.g.:
- create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
- add another extra node s to this digraph
- add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
- create a node map supply as follows:
- supply[r] = 1 for each "red" node r
- supply[b] = -1 for each "blue" node b
- supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
- create an arc map cost as follows:
- cost[a] = -original_weight[a] for each "old" arc a
- cost[a] = 0 for each new arc a
- call a min cost flow algorithm as follows
NetworkSimplex<Digraph> mcf; mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1)); mcf.run();- obtain the matching by reading the flow values (mcf.flow(a)).
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.
Thank you for the explanation, I added them.
I also uploaded some benchmark results again, but not with larger graphs, as they consume my RAM, and I think using swap will influence the result somehow oddly.
Regardless, I will upload them, but it takes some time.
comment:27 follow-up: ↓ 29 Changed 7 years ago by kpeter
I agree that if you run out of physical RAM, your results won't be relevant.
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
On the other hand, the weighted bipartite algorithm is usually slower (by an order of magnitude) than the general matching algorithm as well as the network simplex and cost scaling min cost flow algorithms. (It's also interesting that in several cases, network simplex is the most efficient, especially for dense graphs.)
Do you have any insight about the modest performance of the weighted bp. matching? Do you have any idea to make it faster? Are you aware the bottleneck opperations of the algorithm?
comment:28 in reply to: ↑ 24 Changed 7 years ago by deba
Replying to poroszd:
Replying to alpar:
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
Sorry, I missed the changes regarding the typesafe red and blue nodes.
The above patches use them.
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
Which compiler options did you use for benchmarking? Could you also add the used hardware (CPU + RAM)? It would be also good to add these parameters to the result files.
comment:29 in reply to: ↑ 27 ; follow-up: ↓ 30 Changed 7 years ago by deba
Replying to kpeter:
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
I have few ideas to improve the HK algorithm:
- We need a proof, that we found the optimal solution:
- Please provide a minimum vertex cover (http://en.wikipedia.org/wiki/K%C3%B6nig's_theorem_(graph_theory))
- And also barriers for the red and blue nodes (http://en.wikipedia.org/wiki/Hall's_marriage_theorem)
- Please, also check the proof in the benchmark.
- Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
- I would not separate the red->blue and blue->red steps in the BFS.
- I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
- I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
- I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).
I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.
comment:30 in reply to: ↑ 29 ; follow-up: ↓ 31 Changed 6 years ago by poroszd
Thank You, I updated the Hopcroft-Karp following your suggestions, and added two variants of preflow based matching.
I also found some bug when initializing an elevator with non-default size.
Could You check if my fix is correct?
Replying to deba:
Replying to kpeter:
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
I have few ideas to improve the HK algorithm:
- We need a proof, that we found the optimal solution:
- Please provide a minimum vertex cover (http://en.wikipedia.org/wiki/K%C3%B6nig's_theorem_(graph_theory))
- And also barriers for the red and blue nodes (http://en.wikipedia.org/wiki/Hall's_marriage_theorem)
- Please, also check the proof in the benchmark.
- Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
- I would not separate the red->blue and blue->red steps in the BFS.
- I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
- I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
- I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).
I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.
comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 6 years ago by poroszd
Uh, sorry, somehow I sent the patch twice...
comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed 6 years ago by alpar
Replying to poroszd:
Uh, sorry, somehow I sent the patch twice...
As usual, none of then I can apply to my repo:
$ hg -R /home/alpar/projects/LEMON/hg/lemon-bipartite import --exact /tmp/bpm_upgrade.2-1.patch applying /tmp/bpm_upgrade.2-1.patch abort: unknown revision '4928e7fa0a4062b96c9fada8abd40d3c7041bcae'!
Changed 6 years ago by poroszd
comment:33 in reply to: ↑ 32 ; follow-up: ↓ 34 Changed 6 years ago by poroszd
Replying to alpar:
This patch can be applied to the repo You linked above. It contains:
- [2cf5b6b4bdcb] : update weighted bp matching
- [ce9531988b72] : update lgf reader
- [5cc53dbced80] : minor fix in elevator
- [fd8741fc2143] : Hopcroft-Karp
- [07e1840cc912] : preflow based matching
- [7ae506e08839] : second variant of preflow based matching
- [06e679b37a90] : benchmark tools.
comment:34 in reply to: ↑ 33 Changed 6 years ago by alpar
This patch can be applied to the repo You linked above. It contains:
It's much better now, thank you.
comment:35 Changed 5 years ago by deba
- Milestone changed from LEMON 1.3 release to LEMON 1.4 release
I would move this to 1.4.
Replying to alpar:
The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).