Fortran, Julia, and MATLAB FEM Benchmark Comparison

Fortran, Julia, and MATLAB FEM Benchmark Comparison

Continuing the previous Finite Element Method (FEM) solver and assembly benchmark comparison, this follow up compares the entire solution process for an identical simulation problem, in this case a two-dimensional (2D) Poisson problem solved on a unit square. The model problem is stationary, and discretized with quadrilateral Q1 bilinear Lagrange finite element shape functions on a unit square grid. Finite element subroutines such as matrix assembly have been implemented in MATLAB, Julia, as well as Fortran. For MATLAB corresponding code from the FEATool Multiphysics toolbox were used. The Fortran code, FEM2D, is a reference finite element implementation. Lastly, Julia was a straight port of the Fortran code.

The following tables and curves show the results and timings computed for several grids (the solver is excluded in the plot since comparing a direct solver with multigrid is not very meaningful).

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|  MATLAB |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 1.74e-003 | 1.26e-003 | 2.50e-003 | 1.88e-003 | 2.22e-003 | 2.05e-003 | 1.16e-002 | 1.18e-005 | 4.05e-003 |
|      64 | 7.09e-004 | 9.97e-004 | 6.51e-003 | 3.00e-003 | 6.13e-003 | 9.71e-003 | 2.71e-002 | 4.62e-005 | 2.17e-002 |
|     128 | 1.58e-003 | 6.25e-003 | 1.58e-002 | 8.54e-003 | 2.28e-002 | 4.07e-002 | 9.57e-002 | 1.84e-004 | 9.43e-002 |
|     256 | 5.06e-003 | 2.44e-002 | 5.49e-002 | 4.11e-002 | 9.02e-002 | 1.47e-001 | 3.63e-001 | 8.68e-004 | 4.48e-001 |
|     512 | 3.06e-002 | 9.75e-002 | 2.05e-001 | 1.57e-001 | 3.74e-001 | 6.11e-001 | 1.47e+000 | 4.90e-003 | 2.54e+000 |
|    1024 | 1.23e-001 | 3.86e-001 | 8.13e-001 | 6.32e-001 | 1.51e+000 | 2.51e+000 | 5.97e+000 | 1.99e-002 | 1.48e+001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|   Julia |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 1.11e-004 | 3.77e-004 | 8.97e-004 | 2.61e-004 | 6.76e-005 | 2.30e-004 | 1.94e-003 | 2.15e-005 | 6.00e-003 |
|      64 | 2.22e-004 | 1.67e-003 | 3.58e-003 | 1.00e-003 | 9.88e-005 | 1.20e-003 | 7.78e-003 | 8.86e-005 | 2.80e-002 |
|     128 | 5.66e-004 | 6.76e-003 | 1.45e-002 | 3.92e-003 | 1.41e-004 | 5.96e-003 | 3.19e-002 | 3.65e-004 | 1.20e-001 |
|     256 | 3.20e-003 | 2.71e-002 | 5.79e-002 | 1.58e-002 | 2.48e-004 | 2.39e-002 | 1.28e-001 | 1.55e-003 | 5.12e-001 |
|     512 | 1.30e-002 | 1.10e-001 | 2.35e-001 | 6.66e-002 | 4.11e-004 | 1.77e-001 | 6.02e-001 | 7.77e-003 | 2.58e+000 |
|    1024 | 5.35e-002 | 4.60e-001 | 9.57e-001 | 2.76e-001 | 2.22e-003 | 6.20e-001 | 2.37e+000 | 3.12e-002 | 1.34e+001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
| Fortran |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 0.00e+000 | 8.68e-004 | 8.68e-004 | 0.00e+000 | 0.00e+000 | 0.00e+000 | 1.74e-003 | 2.60e-005 | 4.34e-003 |
|      64 | 0.00e+000 | 8.68e-004 | 7.81e-003 | 2.60e-003 | 8.68e-004 | 0.00e+000 | 1.22e-002 | 9.55e-005 | 8.68e-004 |
|     128 | 4.34e-003 | 4.34e-003 | 1.82e-002 | 1.74e-003 | 0.00e+000 | 0.00e+000 | 2.86e-002 | 2.95e-004 | 1.13e-002 |
|     256 | 3.47e-003 | 1.91e-002 | 6.94e-002 | 1.13e-002 | 1.74e-003 | 0.00e+000 | 1.05e-001 | 1.12e-003 | 4.08e-002 |
|     512 | 3.30e-002 | 9.20e-002 | 2.26e-001 | 4.17e-002 | 1.74e-003 | 0.00e+000 | 3.94e-001 | 4.93e-003 | 1.77e-001 |
|    1024 | 1.35e-001 | 3.86e-001 | 9.05e-001 | 1.67e-001 | 8.68e-004 | 0.00e+000 | 1.59e+000 | 2.35e-002 | 7.65e-001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

t_grid   - grid generation
t_ptr    - matrix pointers
t_asm_A  - FE matrix assembly
t_asm_f  - right hand side assembly
t_bdr    - Dirichlet boundary conditions
t_sparse - conversion to sparse matrix format
t_tot    - total time (excluding t_solve)
t_spmv   - sparse matrix vector multiplication
t_solve  - linear solver

What one can see in the data is that the MATLAB, Julia, and Fortran codes all feature almost identical matrix assembly times. MATLAB has a hard to shake reputation as downright slow and one might indeed be surprised and skeptical to see that it performed just as fast as Fortran. In this case this is due to a heavily vectorized and optimized FEM assembly routines.

Linear solver time is not directly compared since MATLAB, and Julia all use a direct solver per default (currently Umfpack/SuiteSparse) while the FEM2D Fortran code uses a significantly more efficient and optimal geometric multigrid iterative solver.

Summary

The entire solution process for an identical Poisson problem has been studied for comparable implementations of the Finite Element Method (FEM). Although the MATLAB (FEATool Multiphysics) implementation were indeed overall slower than the Fortran code, due to the significant vectorization and optimizations they are in fact not several magnitudes slower as might otherwise be expected of MATLAB code.

Altogether the FEATool Multiphysics timing runs compare quite well to the Fortran reference code up to about a million unknowns. Thus we can conclude that it is indeed with some effort possible to write both complex and high performance MATLAB code for partial differential equation (PDE) and continuum mechanics simulations.

var tr1={x:[32,64,128,256,512,1024],y:[0.011647,0.027056,0.095714,0.362877,1.474662,5.971518],type:'scatter',name:'MATLAB',line:{color:'rgb(255,0,0)',width:2}}; var tr2={x:[32,64,128,256,512,1024],y:[0.001943,0.007779,0.031885,0.128082,0.601970,2.369513],type:'scatter',name:'Julia',line:{color:'rgb(0,255,0)',width:2}}; var tr3={x:[32,64,128,256,512,1024],y:[0.001736,0.012153,0.028646,0.105035,0.394097,1.592882],type:'scatter',name:'Fortran',line:{color:'rgb(0,0,255)',width:2}}; var layout={xaxis:{title:'Grid density, 1/h',tickvals:[32,64,128,256,512,1024]},yaxis:{title:'CPU Time [s]'}}; data=[tr1,tr2,tr3]; Plotly.newPlot('featool-multiphysics-matlab-fem-benchmark-div', data, layout);