Opened 10 years ago
Last modified 7 weeks ago
#168 new task
Port bipartite matching algorithms
Reported by: | Alpar Juttner | Owned by: | Alpar Juttner |
---|---|---|---|
Priority: | critical | Milestone: | LEMON 1.4 release |
Component: | core | Version: | hg main |
Keywords: | Cc: | ||
Revision id: |
Attachments (20)
Change History (56)
comment:1 Changed 10 years ago by
Summary: | Port bipartite matvhing algorithms → Port bipartite matching algorithms |
---|
comment:2 Changed 10 years ago by
Milestone: | → LEMON 1.2 release |
---|---|
Priority: | major → critical |
Type: | defect → task |
comment:3 follow-up: 4 Changed 9 years ago by
comment:4 follow-up: 5 Changed 9 years ago by
Replying to deba:
The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).
I don't mind. Having a slow implementation is better than having nothing. We can mention in the doc that the implementation is inefficient and may significantly change in the future.
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
comment:5 follow-up: 6 Changed 9 years ago by
Replying to alpar:
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
You mean "worth keeping", do you?
comment:6 follow-up: 8 Changed 9 years ago by
comment:7 Changed 9 years ago by
Milestone: | LEMON 1.2 release → LEMON 1.3 release |
---|
comment:8 Changed 8 years ago by
Replying to alpar:
Replying to kpeter:
Replying to alpar:
On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.
You mean "worth keeping", do you?
Yes, of course.
No, if a SSP matching is necessary, then a new implementation should be written, because the current ones are extremely slow. I think, the current capacity scaling MCF is pretty fast, it would be necessary to add the same ideas to the matching algorithms (potential shift, stop on first found node...).
The max cardinality algorithms need really good names.
Changed 8 years ago by
Attachment: | bp_matching.h added |
---|
comment:9 follow-up: 10 Changed 8 years ago by
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
comment:10 follow-up: 11 Changed 8 years ago by
Replying to elkebir:
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
I did not look at the code deeply, but could you make two changes:
- The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
- First you need a development version (hg clone http://lemon.cs.elte.hu/hg/lemon)
- Than you can patch in the bpgraphs.patch (hg import bpgraphs.patch)
- You can also create a patch file
- http://lemon.cs.elte.hu/trac/lemon/wiki/CommitGuides
- Add your new file (hg add lemon/bp_matching.h)
- Modify lemon/Makefile.am
- Commit your files (hg commit)
- Create a patch file (hg export -r tip >bp_matching.patch)
comment:11 Changed 8 years ago by
Replying to deba:
Replying to elkebir:
I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?
I did not look at the code deeply, but could you make two changes:
- The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
- First you need a development version (hg clone http://lemon.cs.elte.hu/hg/lemon)
- Than you can patch in the bpgraphs.patch (hg import bpgraphs.patch)
- You can also create a patch file
- http://lemon.cs.elte.hu/trac/lemon/wiki/CommitGuides
- Add your new file (hg add lemon/bp_matching.h)
- Modify lemon/Makefile.am
- Commit your files (hg commit)
- Create a patch file (hg export -r tip >bp_matching.patch)
Thanks. The BpGraph? classes are exactly what I needed. I will port my code to use them.
Changed 8 years ago by
Attachment: | bp_matching_b08198bdd613.patch added |
---|
Maximum Weight Bipartite Matching (b08198bdd613)
Changed 8 years ago by
Attachment: | bp_matching_test_751a90ada614.patch added |
---|
Test cases for Maximum Weight Bipartite Matching (751a90ada614)
comment:12 Changed 8 years ago by
I attached a patch containing two implementations of Maximum Weight Bipartite Matching using the latest BpGraph? patch by deba. Also a patch containing test cases is attached. I made some improvements in the SSP matching algorithms, now the performance difference in dense graphs between the two methods is negligible.
comment:13 Changed 8 years ago by
I atteched a patch implementing the Hopcroft-Karp matching algoritm. The patch uses the bipartite graph concept and the lgf_reader for bipartite graphs from ticket #69
Changed 8 years ago by
Attachment: | h_k_45d2e43f52ea.patch added |
---|
In the previous patch the algorithm was erroneous, it is corrected.
comment:14 Changed 8 years ago by
This patch ( [45d2e43f52ea] ) should replace the previous, that was totally incorrect.
comment:15 follow-up: 16 Changed 7 years ago by
I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.
The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.
comment:16 follow-up: 17 Changed 7 years ago by
You are totally right, in the above attached patch it should be correct.
Replying to deba:
I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.
The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.
comment:17 follow-up: 18 Changed 7 years ago by
Replying to poroszd:
You are totally right, in the above attached patch it should be correct.
- Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
- Have you done any running time tests on how does this version performs compared to the previous one?
comment:18 Changed 7 years ago by
Replying to alpar:
Replying to poroszd:
You are totally right, in the above attached patch it should be correct.
- Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
I used the development repository, after importing the patches with the bpgraphs and the LGF reader.
- Have you done any running time tests on how does this version performs compared to the previous one?
Well, this is roughly the version I used in the thesis (I forgot to commit here after noticing my error), so those results are valid.
comment:19 Changed 7 years ago by
Could you upload a simple benchmark for the matching algorithms?
- Random bipartite graphs are fine with random costs
- For unweighted case, I would like to see comparison with general matching and with the maxflow algorithm
- For the weighted case, with the general weighted matching and the min cost flow algorithms
- A text file, with a few test cases would be fine (including sparse and dense graphs)
comment:20 follow-up: 21 Changed 7 years ago by
Notes for benchmarking weighted matchings:
- You can also use the NETGEN generator to create problem instances easily. It is capable of generating problem instances for the so-called assignment problem, which is indeed a weighted bipartite matching. See e.g. http://en.wikipedia.org/wiki/Assignment_problem.
- The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
- I agree with Balazs that comparision with min cost flow algorithms would be important. I suggest you to try
CapacityScaling
,CostScaling
andNetworkSimplex
. I guess thatCapacityScaling
will be the fastest for this special case, but the matching algorithms should be even more efficient. :)
comment:21 Changed 7 years ago by
Replying to kpeter:
- The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
The NETGEN generator is already integrated with benchmark tool. I wish all lemon related benchmark were done using that tool.
Changed 7 years ago by
Attachment: | bp_benchmark_7a790fbf8e7d.patch added |
---|
Bipartite matching benchmark tools
comment:22 follow-up: 25 Changed 7 years ago by
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket). I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
comment:23 follow-up: 24 Changed 7 years ago by
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
Changed 7 years ago by
Attachment: | d6df00b92912.patch added |
---|
Update weighted bp matching for typesafe node sets
comment:24 follow-up: 28 Changed 7 years ago by
Replying to alpar:
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
Sorry, I missed the changes regarding the typesafe red and blue nodes. The above patches use them.
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
comment:25 follow-up: 26 Changed 7 years ago by
Replying to poroszd:
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
There are more ways to do that. E.g.:
- create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
- add another extra node s to this digraph
- add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
- create a node map
supply
as follows:supply[r] = 1
for each "red" node rsupply[b] = -1
for each "blue" node bsupply[s] = number_of_blue_nodes - number_of_red_nodes
(i.e. the sum of supply[v] fo all nodes v is equal to zero)
- create an arc map
cost
as follows:cost[a] = -original_weight[a]
for each "old" arc acost[a] = 0
for each new arc a
- call a min cost flow algorithm as follows
NetworkSimplex<Digraph> mcf; mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1)); mcf.run();
- obtain the matching by reading the flow values (
mcf.flow(a)
).
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.
Changed 7 years ago by
Attachment: | df198f12f7fa.patch added |
---|
Min cost flow algorithms included in the benchmark
comment:26 Changed 7 years ago by
Replying to kpeter:
Replying to poroszd:
I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).
I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?
I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.
There are more ways to do that. E.g.:
- create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
- add another extra node s to this digraph
- add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
- create a node map
supply
as follows:
supply[r] = 1
for each "red" node rsupply[b] = -1
for each "blue" node bsupply[s] = number_of_blue_nodes - number_of_red_nodes
(i.e. the sum of supply[v] fo all nodes v is equal to zero)- create an arc map
cost
as follows:
cost[a] = -original_weight[a]
for each "old" arc acost[a] = 0
for each new arc a- call a min cost flow algorithm as follows
NetworkSimplex<Digraph> mcf; mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1)); mcf.run();- obtain the matching by reading the flow values (
mcf.flow(a)
).However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.
Thank you for the explanation, I added them.
I also uploaded some benchmark results again, but not with larger graphs, as they consume my RAM, and I think using swap will influence the result somehow oddly. Regardless, I will upload them, but it takes some time.
comment:27 follow-up: 29 Changed 7 years ago by
I agree that if you run out of physical RAM, your results won't be relevant.
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
On the other hand, the weighted bipartite algorithm is usually slower (by an order of magnitude) than the general matching algorithm as well as the network simplex and cost scaling min cost flow algorithms. (It's also interesting that in several cases, network simplex is the most efficient, especially for dense graphs.)
Do you have any insight about the modest performance of the weighted bp. matching? Do you have any idea to make it faster? Are you aware the bottleneck opperations of the algorithm?
comment:28 Changed 7 years ago by
Replying to poroszd:
Replying to alpar:
I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:
Sorry, I missed the changes regarding the typesafe red and blue nodes. The above patches use them.
However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.
Which compiler options did you use for benchmarking? Could you also add the used hardware (CPU + RAM)? It would be also good to add these parameters to the result files.
comment:29 follow-up: 30 Changed 7 years ago by
Replying to kpeter:
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
I have few ideas to improve the HK algorithm:
- We need a proof, that we found the optimal solution:
- Please provide a minimum vertex cover (http://en.wikipedia.org/wiki/K%C3%B6nig's_theorem_(graph_theory))
- And also barriers for the red and blue nodes (http://en.wikipedia.org/wiki/Hall's_marriage_theorem)
- Please, also check the proof in the benchmark.
- Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
- I would not separate the red->blue and blue->red steps in the BFS.
- I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
- I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
- I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).
I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.
Changed 7 years ago by
Attachment: | bpm_upgrade.patch added |
---|
Improved Hopcroft-Karp, an two additional
Changed 7 years ago by
Attachment: | bpm_upgrade.2.patch added |
---|
Improved Hopcroft-Karp; two preflow based matching
comment:30 follow-up: 31 Changed 7 years ago by
Thank You, I updated the Hopcroft-Karp following your suggestions, and added two variants of preflow based matching.
I also found some bug when initializing an elevator with non-default size. Could You check if my fix is correct?
Replying to deba:
Replying to kpeter:
According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!
I have few ideas to improve the HK algorithm:
- We need a proof, that we found the optimal solution:
- Please provide a minimum vertex cover (http://en.wikipedia.org/wiki/K%C3%B6nig's_theorem_(graph_theory))
- And also barriers for the red and blue nodes (http://en.wikipedia.org/wiki/Hall's_marriage_theorem)
- Please, also check the proof in the benchmark.
- Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
- I would not separate the red->blue and blue->red steps in the BFS.
- I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
- I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
- I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).
I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.
comment:32 follow-up: 33 Changed 7 years ago by
Replying to poroszd:
Uh, sorry, somehow I sent the patch twice...
As usual, none of then I can apply to my repo:
$ hg -R /home/alpar/projects/LEMON/hg/lemon-bipartite import --exact /tmp/bpm_upgrade.2-1.patch applying /tmp/bpm_upgrade.2-1.patch abort: unknown revision '4928e7fa0a4062b96c9fada8abd40d3c7041bcae'!
Changed 7 years ago by
Attachment: | bpm_pack.patch added |
---|
comment:33 follow-up: 34 Changed 7 years ago by
Replying to alpar:
This patch can be applied to the repo You linked above. It contains:
- [2cf5b6b4bdcb] : update weighted bp matching
- [ce9531988b72] : update lgf reader
- [5cc53dbced80] : minor fix in elevator
- [fd8741fc2143] : Hopcroft-Karp
- [07e1840cc912] : preflow based matching
- [7ae506e08839] : second variant of preflow based matching
- [06e679b37a90] : benchmark tools.
comment:34 Changed 7 years ago by
This patch can be applied to the repo You linked above. It contains:
It's much better now, thank you.
comment:35 Changed 6 years ago by
Milestone: | LEMON 1.3 release → LEMON 1.4 release |
---|
I would move this to 1.4.
Changed 7 weeks ago by
Attachment: | 3180fd18651b.patch added |
---|
Changed 7 weeks ago by
Attachment: | results.2.txt added |
---|
Changed 7 weeks ago by
Attachment: | 5b4317f3ce36.patch added |
---|
comment:36 Changed 7 weeks ago by
I experimented with the simplified implementation of the current weighted general matching algorithms. The test implementations are in [3180fd18651b] and the running times are in results.2.txt.
Between the test implementations, I have chosen MaxWeightedBpMatching1, and I created a patch from it: [5b4317f3ce36]. This patch contains also perfect matching implementation.
Replying to alpar:
The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).