Makefile: add link-time optimization support#420
Makefile: add link-time optimization support#420pjonsson wants to merge 2 commits intodavidfrantz:developfrom
Conversation
Add flags for enabling link-time optmization during compilation by calling "make LTO=yes". I cannot find a definition of LDFLAGS, so just chuck it into GDAL_FLAGS which should be used for the binaries that does heavy computation. I don't have any performance comparisons, but compiling with LTO makes the binary noticeably smaller, before: $ ls -l `which force-l2ps` -rwxr-xr-x 1 root root 2668408 Mar 16 22:02 /usr/local/bin/force/force-l2ps and after: $ ls -l `which force-l2ps` -rwxr-xr-x 1 root root 1986848 Mar 16 22:14 /usr/local/bin/force/force-l2ps Reducing the binary size is likely to improve the instruction cache hit rate even if we pessimistically assume that LTO didn't manage to make any other optimization.
|
The sample size is 1, but here are some rough measurements of CPU-minutes required for creating a L2A product from a Sentinel-2 L1C product with
It's my desktop machine so there are web browsers and other things running in the background at the same time, but the window focused on by my mouse cursor was the FORCE container. |
|
This seems interesting. I will run some tests before merging, though |
|
I found some time to make a quick test: I tested using a Landsat image, converted to L2 ARD, using 32 threads on a testing machine. I ran the 1st row twice to get rid of caching-related differences.
The results are interesting. There might be a small performance gain with the last combo. I'd like to make some more tests as n is still quite low. |
|
GDAL 3.13 is just around the corner, and that Docker image will be based on Ubuntu 26.04 that contains GCC 15 instead of the current GCC 13 in Ubuntu 24.04. I should probably warn you that there's a risk that your careful measurements on GCC 13 might not be that relevant in a couple of weeks. Since we're on the subject of performance: when I run Is the initial cap to only 2 threads working an old decision based on measurements, or is it because the initial part of the processing only gets a speedup from at most 2 threads? |
Add flags for enabling link-time
optmization during compilation
by calling "make LTO=yes" and use
this in the Dockerfile.
I cannot find a definition of
LDFLAGS, so just chuck it into
GDAL_FLAGS which should be used
for the binaries that does heavy
computation.
I don't have any performance
comparisons, but compiling with LTO
makes the binary noticeably smaller,
before:
$ ls -l
which force-l2ps-rwxr-xr-x 1 root root 2668408 Mar 16 22:02 /usr/local/bin/force/force-l2ps
and after:
$ ls -l
which force-l2ps-rwxr-xr-x 1 root root 1986848 Mar 16 22:14 /usr/local/bin/force/force-l2ps
Reducing the binary size is likely
to improve the instruction cache
hit rate even if we pessimistically
assume that LTO didn't manage to make
any other optimization.