Referee reports # 3, CMAME Sep. 2016 (decision: reject) #21

labarba · 2016-10-19T17:26:36Z

Revised paper submitted to CMAME on 15 March 2016 — Decision: Reject

Editor comments

I regret to inform you that the reviewers of your manuscript have advised against publication, and I must therefore decline it. For your guidance, the reviewers' comments are included below.

Referee # 3 comments

SUMMARY

The paper concerns the solution of linear systems related to boundary element (BE) discretizations of single and double layer operators for the Laplace and Stokes equations in three dimensions. The contribution of the paper is to apply the ideas in [13] to these BE linear systems and to empirically study their performance. The authors use a collocation method with adaptive Gaussian quadrature defined on triangulation of smooth surfaces. The discretized systems are accelerated with a fast multipole method in which the far field is approximated using analytic expansions. The resulting linear system is solved with an unpreconditioned Krylov method. Numerical results are presented for exterior problems for simply and multiply-connected smooth domains, for single and double-layer operators, and for the Laplace and Stokes problems with up to 400K unknowns.

The key idea (section 2.6, page 8) is to use variable fidelity matrix-vector products in the Krylov method to accelerate the overall scheme (by using inexact-- and thus, cheaper-- matrix-vector products). The authors claim 3X speedups.

SUMMARY REVIEW
The paper is incremental and has technical shortcomings that make me question the validity of the conclusions from the numerical results. For these two reasons I don't think that this paper is at a level appropriate for CMAME.

--- DETAILED REVIEW ----

STRENGTHS

The paper is for the most part well written. Most of the details required to understand the method are in the paper and, most important, the code along with scripts that can reproduce the results are available on github.
The problem of accelerating Krylov solvers for boundary integral equations is important, especially in 3D. For single-layer potentials the problem is ill-conditioned. For double-layer potentials problems arise in regions of high-curvature or worst in the presence of edges and corners. So the problem in which the authors are trying to tackle is important.
The authors have made an effort to examine many different aspects of the problem in various settings. I can really appreciate the time and effort the took to conduct the numerical experiments and make the whole setup (code and paper sources) freely available. This paper is an exemplar of what reproducible research should look like.

WEAKNESSES

The theoretical/algorithmic contribution is quite incremental. The main idea comes from 2005 in references [12] and [13]. In the paper the authors apply these ideas by introducing inexact matvecs by adjusting the accuracy of the far field ( p-expansions) in FMM, which a rather obvious thing to do. So the contribution is really just equation (19) in 2.6.
There is no literature review in this paper.
There is a significant amount of work in accelerating Krylov methods for integral equations. Many alternative methods come with theoretical results for optimality.
The only reason one would consider the scheme proposed by the authors is that it is trivial to implement. Papers of Greengard, Rokhlin, Ho, Ying, Zorin, Martinsson, Gillman, Darve, etc on fast direct solvers (which are typically used as preconditioners) are not even mentioned. arXiv:1511.06029v1 (optimal preconditioners in 3D and many references of prior work). SISC 98 Grama, Kumar, Sameh (inexact FMM as a preconditioner)
More important I have significant questions regarding the experiments:
I don't believe it is possible that changing the far-field approximation fidelity can have such dramatic effects. In well tuned FMM implementations the near and far field should take roughly the same time. The authors dispute that but I don't see strong evidence. I agree that Ncrit should be chosen to optimize wall clock time and as one changes $p$ one should change Ncrit; however for each $p$ the two parts (near and far) should be balanced. I'd like to see breakdown times of near and far, these are not presented in the paper. Also, I'd like to see evidence that these are good implementations that achieve maximum performance. Otherwise the results presented are only valid for the particular implementation.

Given that when solving the boundary integrals one has to include the quadrature costs, the far field ends up taking less than one third of the overall computation. So I don't see how you can get more than 1.6X speedup per iteration. Also, the number of Krylov iterations should increase as we vary the accuracy right? Also, where are the matrix setup times for the matrix? I'd like to see end-to-end timings and see if the benefits persist.

DETAILED comments/questions

are you using uniform or adaptive fmm?
the claim that the optimal far field for fmm is p^4 is wrong. It is p^3.
see: jcph.1999.6355 (Cheng, Greengard, Rokhlin).
in Eq17, is r_k-1 absolute or relative (i believe it is the absolute but you need
to state it)
Figure 3: your Ncrit is too small. I wonder why. You also need to report accuracy of the FMM (estimated), GFLOPS rates to show that your implementation is optimal (since you report timings), and you need to repeat it for p=8, p=12.
Figure 5: this is the gist of the paper, it is used to assert that reducing p deteriorates the accuracy. But I don't see any significant drop in accuracy. You should tabulate the results for the larger N for p=5, p=8, p=12 and you should report both FMM accuracy and BIE accuracy. At any rate, in my opinion the results do not support your arguments. It is very likely that p=5 or p=8 are sufficient to give you the accuracy you want in BIE.
Figure 6: here we see that when we change p the number of Krylov iterations increases. Why is it that this effect disappears in Tables 1 and 2? Although the time per iteration is smaller the number of iterations should increase.

labarba · 2017-01-27T20:44:43Z

The theoretical/algorithmic contribution is quite incremental. The main idea comes from 2005 in references [12] and [13]. In the paper the authors apply these ideas by introducing inexact matvecs by adjusting the accuracy of the far field ( p-expansions) in FMM, which a rather obvious thing to do. So the contribution is really just equation (19) in 2.6.

We don't claim to make a theoretical contribution. But we do seem to be the first ones to use the existing theory of inexact GMRES to speeding-up the FMM algorithm. Even though it may seem obvious to do, after the fact, it hasn't been done—despite the possibility of good speed-ups. The contribution is to implement the idea in an FMM code, test it thoroughly, and show the actual speed-ups that it can offer! Upon showing this result, the community has a good reason to adopt the method.

labarba · 2017-01-27T20:50:59Z

There is no literature review in this paper.
There is a significant amount of work in accelerating Krylov methods for integral equations. Many alternative methods come with theoretical results for optimality.
The only reason one would consider the scheme proposed by the authors is that it is trivial to implement. Papers of Greengard, Rokhlin, Ho, Ying, Zorin, Martinsson, Gillman, Darve, etc on fast direct solvers (which are typically used as preconditioners) are not even mentioned.

We cite the relevant literature for inexact GMRES. The referee says that there is significant amount of work in accelerating Krylov methods, but mentions various people working on direct solvers, not iterative solvers. We could not find others that improve over standard FMM to accelerate Krylov methods: there are of course many authors applying FMM in this scenario, but without modification. Grama et al. '98 use the idea of a sparse approximate pre-conditioner for the linear system, while Corona's pre-print presents a pre-conditioning method based on tensor decomposition. Preconditioning attacks the time-to-solution by reducing the number of iterations, while we attack the effort to complete each iteration by reducing p. They are orthogonal approaches.

We have added a reference to a review paper that covers the topic of FMM acceleration of Krylov methods: Nishimura, 2002. (commit 1554d6c)

labarba · 2017-01-27T22:36:44Z

I don't believe it is possible that changing the far-field approximation fidelity can have such dramatic effects. In well tuned FMM implementations the near and far field should take roughly the same time. The authors dispute that but I don't see strong evidence. I agree that Ncrit should be chosen to optimize wall clock time and as one changes $p$ one should change Ncrit; however for each $p$ the two parts (near and far) should be balanced. I'd like to see breakdown times of near and far, these are not presented in the paper. Also, I'd like to see evidence that these are good implementations that achieve maximum performance.

First, regarding the performance characteristics of the FMM code used in this study, we see in Figure 3 that it takes < 10 s to compute 1 million points. This timing falls in the middle of the range of timings shown by Yokota (2013) for five different publicly available codes. Both his benchmark and our result are multi-core, on six CPU cores, although the CPUs are not exactly the same. This is indication that our code offers a representative result in regards to performance. (It may not be the fastest of them all, but it is competitive.)

Next, regarding the balance between near and far field: we show the time breakdown for each iteration in Figure 9. We have now added the numbers for total time spent in P2P and M2L: 63 vs. 47 seconds. This is evidence that the overall computation is balanced. We also have collected detailed evidence of the exercise to find the best Ncrit in the supplementary materials. All the runs are reproducible: detailed information is provided to repeat each case (and of course the code is open and available). There is no better evidence than open code and data: if anyone were to doubt the result, they can run themselves.

tingyu66 · 2017-01-27T22:42:46Z

are you using uniform or adaptive fmm?

We used an adaptive octree in our FMM, added by commit d90a0c8.

labarba · 2017-01-27T23:13:06Z

the claim that the optimal far field for fmm is p^4 is wrong. It is p^3.

The text in the paper made no such claim. It says that Cartesian expansions scale as O(p^6) and using spherical harmonics "reduces this to O(p^4)" — yes, this can be reduced further to p^3 with rotations. But the operator with rotations gives a real benefit only at very high p, say p>20.

We have modified the text to make this more clear, and added a reference to a paper showing that the p^4 operator is faster at common values of p (lower than ~20), see commit a7e9937.

tingyu66 · 2017-01-27T23:34:04Z

in Eq17, is r_k-1 absolute or relative (i believe it is the absolute but you need
to state it)

It is the absolute residual at step k-1. We've added the missing "absolute", see commit 1595beb.

labarba · 2017-01-27T23:54:37Z

Figure 3: your Ncrit is too small. I wonder why. You also need to report accuracy of the FMM (estimated), GFLOPS rates to show that your implementation is optimal (since you report timings), and you need to repeat it for p=8, p=12.

The value Ncrit used in the results of Figure 3 was chosen after systematically running with 11 values between 100 and 200, plus additional cases between 50 and 400, for the case with largest N, i.e. one million points. Ncrit=125 gives the shortest time to solution, with P2P taking 2.3s and M2L taking 2.7s—a well-balanced FMM calculation. The fact that the scaling displays the O(N) complexity is evidence that the FMM is balanced throughout. (We added more details about this have been added to the supplementary materials.)

We added the accuracy information, as requested.

GFLOP/s are not informative, in this case: consider that pure P2P would offer the most GLFOP/s! The referee's concern is that the code used for this work is a good implementation, and for this we can use the benchmark provided by Yokota (2013) comparing five publicly available codes. For one million points, we time at 10s, which is in the middle of the range of timings given by Yokota (all multi-core tests on six cores). This is evidence that our code performs competitively.

Figure 3 has the sole purpose of showing that our implementation scales as expected: at O(N). There is no reason to repeat this with different values of p. Of course, if we run it with a higher p, the optimal Ncrit will be larger (more expensive far-field calculation).

labarba · 2017-01-28T00:18:20Z

Figure 5: this is the gist of the paper, it is used to assert that reducing p deteriorates the accuracy. But I don't see any significant drop in accuracy. You should tabulate the results for the larger N for p=5, p=8, p=12 and you should report both FMM accuracy and BIE accuracy. At any rate, in my opinion the results do not support your arguments. It is very likely that p=5 or p=8 are sufficient to give you the accuracy you want in BIE.

Figure 5 has the purpose of showing the spatial convergence of the full boundary-integral solution, using an analytical solution as reference. It is certainly not the gist of the paper: it is due diligence. The reason to show both p=10 and 12 was to provide evidence that 12 gives a tiny improvement of accuracy at the largest value of N, and p=10 was accurate enough for the first-kind equation. To avoid confusion, we have changed the presentation of the convergence, now in a separate figure for first- and second-kind equations. We used two sets of the accuracy parameters: p, number of Gauss points, and solver tolerance. The "tight" set of parameters was used to compute the observed order of convergence. The "loose" parameters are for the future speed-up tests, which would be biased otherwise. The plots show that accuracy is minimally affected by the "loose" set of parameters (but this was not known ahead of time, hence the need for the "tight" set). See Table 1 for the parameter values.

The referee's point about observing the actual drop in accuracy with p (not really addressed in Figure 5) has been documented in the supplementary materials. There, we show the error of the potential at the collocation points, and at an exterior point, with varying p. For example, with N=8192, the error falls one order of magnitude as p is increased from 3 to 6; then it falls again 4x increasing p to 9.

labarba · 2017-01-28T00:21:42Z

Figure 6: here we see that when we change p the number of Krylov iterations increases. Why is it that this effect disappears in Tables 1 and 2? Although the time per iteration is smaller the number of iterations should increase.

Figure 6 shows results with p=8 (left) and p=12 (right). More iterations are needed to converge with the lower starting p. In Tables 1 and 2, we do not see this effect because there is only one value of p used in all cases.

labarba added referee comments reject labels Oct 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Referee reports # 3, CMAME Sep. 2016 (decision: reject) #21

Referee reports # 3, CMAME Sep. 2016 (decision: reject) #21

labarba commented Oct 19, 2016 •

edited

Loading

labarba commented Jan 27, 2017

labarba commented Jan 27, 2017 •

edited

Loading

labarba commented Jan 27, 2017

tingyu66 commented Jan 27, 2017 •

edited

Loading

labarba commented Jan 27, 2017 •

edited by tingyu66

Loading

tingyu66 commented Jan 27, 2017

labarba commented Jan 27, 2017

labarba commented Jan 28, 2017

labarba commented Jan 28, 2017

Referee reports # 3, CMAME Sep. 2016 (decision: reject) #21

Referee reports # 3, CMAME Sep. 2016 (decision: reject) #21

Comments

labarba commented Oct 19, 2016 • edited Loading

labarba commented Jan 27, 2017

labarba commented Jan 27, 2017 • edited Loading

labarba commented Jan 27, 2017

tingyu66 commented Jan 27, 2017 • edited Loading

labarba commented Jan 27, 2017 • edited by tingyu66 Loading

tingyu66 commented Jan 27, 2017

labarba commented Jan 27, 2017

labarba commented Jan 28, 2017

labarba commented Jan 28, 2017

labarba commented Oct 19, 2016 •

edited

Loading

labarba commented Jan 27, 2017 •

edited

Loading

tingyu66 commented Jan 27, 2017 •

edited

Loading

labarba commented Jan 27, 2017 •

edited by tingyu66

Loading