Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

How to optimize Docker image size? #45

Open
JanMikes opened this issue Sep 22, 2020 · 5 comments
Open

How to optimize Docker image size? #45

JanMikes opened this issue Sep 22, 2020 · 5 comments

Comments

@JanMikes
Copy link

JanMikes commented Sep 22, 2020

Hi, i was finally able to build a Docker image with Bistro, but i am a bit worried about it's enormous size. It has roughly 5.2gb.

Do you have any tips how to reduce it's size?

It is automatically generated Dockerfile using fbcode_builder.

Basically it is repeating blocks of download+build+install blocks:

### Check out fmtlib/fmt, workdir build ###

USER root
RUN mkdir -p '/home' && chown 'nobody' '/home'
USER 'nobody'
WORKDIR '/home'
RUN git clone  https://github.com/'fmtlib/fmt'
USER root
RUN mkdir -p '/home'/'fmt'/'build' && chown 'nobody' '/home'/'fmt'/'build'
USER 'nobody'
WORKDIR '/home'/'fmt'/'build'
RUN git checkout '6.2.1'

### Build and install fmtlib/fmt ###

RUN CXXFLAGS="$CXXFLAGS -fPIC -isystem "'/home/install'"/include" CFLAGS="$CFLAGS -fPIC -isystem "'/home/install'"/include" cmake -D'CMAKE_INSTALL_PREFIX'='/home/install' -D'BUILD_SHARED_LIBS'='ON' '..'
RUN make -j '4' VERBOSE=1 
RUN make install VERBOSE=1 

I was thinking if i can somehow remove cache. Maybe just rm -rf /fmt (same for every other cloned repository) after package is installed could help to reduce size.

As well i do not usually use c++ so i do not know how it really works internally, please if i am mistaken and my idea is stupid, just correct me 😄 if we could take only the final binaries and extract them to different, clean, docker image?

Other idea was using some alpine based linux or other base image than ubuntu (quick googling brought me to https://github.com/madduci/docker-cpp-env).

Can anything of this work or would you suggest anything completely different?

I was thinking about having autoscaling mechanism for bistro workers etc on aws spot instances (maybe even as lambdas) and for these purposes i wanted to have image as thin as possible.

@snarkmaster
Copy link
Contributor

snarkmaster commented Sep 22, 2020

I've never tried solving for docker image size, but, I can give you some thoughts:

  • Yes, you can definitely remove the build trees, and the ccache build cache directory.
  • Installing to a smaller base chroot like Alpine could be doable but it would take more work, probably not the lowest-hanging fruit.
  • You could apt-get remove packages you no longer need after the build.
  • FB C++ binaries tend to be large, mainly due to large debuginfo generated by templated code. You may be able to strip the debug info, at the cost of losing useful backtraces.

I plan to switch the OSS CI build to static linking at some point, which would help your use-case. Unfortunately, it's hard to for me to predict how long this will take, because I'm bad at CMake and there are some hurdles.

Internally, statically linked binaries go like this:

  • bistro_scheduler: 360MB before strip, 17M after strip
  • bistro_worker: 335MB before strip, 16M after.

The upshot here is that if you're willing to lose useful backtraces, you can probably get these to be pretty small.

@JanMikes
Copy link
Author

You could apt-get remove packages you no longer need after the build.

Good start, not sure which are those though 😄

Yes, you can definitely remove the build trees, and the ccache build cache directory.

I am limited with my c++ knowledge. I have not found ccache directory by running find /home -type d -name "*cache*" -print. By build trees you mean the cloned repositories?

I just made very quick test with dockerfile like this (it is common multi-stage strategy to optimize buildes, so i have builder image + the real carrying only executables):

FROM ubuntu:18.04

COPY --from=docker.pkg.github.com/rectorphp/docker-base-bistro-image-builder/bistro:latest /home/bistro/bistro/cmake/Debug/server/bistro_scheduler /bistro/bistro_scheduler
COPY --from=docker.pkg.github.com/rectorphp/docker-base-bistro-image-builder/bistro:latest /home/bistro/bistro/cmake/Debug/worker/bistro_worker /bistro/bistro_worker

But as expected it fails on some missing dependencies, mine was:

./bistro_scheduler: error while loading shared libraries: libfolly.so: cannot open shared object file: No such file or directory

There was just an idea, if i could copy everything needed & compiled things into new image without any unnecessary stuff.

nobody@9e7a345ca022:/home$ ls
bistro	fbthrift  fizz	fmt  folly  googletest	install  libsodium  mvfst  proxygen  wangle  zstd

I even tried to remove everything except bistro directory, but it has same error message: libfolly.so: cannot open shared object file: No such file or directory and that brings me to idea of deleting everything except of these:

nobody@f2a57d2acdd6:/home$ find /home -type f -name "*.so" -print
/home/install/lib/libconcurrency.so
/home/install/lib/libcompiler_lib.so
/home/install/lib/libcompiler_ast.so
/home/install/lib/libtransport.so
/home/install/lib/libthriftcpp2.so
/home/install/lib/libcompiler_generators.so
/home/install/lib/libthrift-core.so
/home/install/lib/libmustache_lib.so
/home/install/lib/libthriftprotocol.so
/home/install/lib/libprotocol.so
/home/install/lib/libthriftfrozen2.so
/home/install/lib/libcompiler_generate_templates.so
/home/install/lib/libasync.so
/home/install/lib/librpcmetadata.so
/home/install/lib/libcompiler_base.so
/home/install/lib/libthriftmetadata.so
/home/install/lib/libmvfst_state_qpr_functions.so
/home/install/lib/libmvfst_state_ack_handler.so
/home/install/lib/libmvfst_state_pacing_functions.so
/home/install/lib/libmvfst_state_simple_frame_functions.so
/home/install/lib/libmvfst_exception.so
/home/install/lib/libmvfst_state_stream_functions.so
/home/install/lib/libmvfst_state_functions.so
/home/install/lib/libmvfst_constants.so
/home/install/lib/libmvfst_state_machine.so
/home/install/lib/libfizz_test_support.so
/home/install/lib/libfolly.so
/home/install/lib/libfollybenchmark.so
/home/install/lib/libfolly_test_util.so
/home/fbthrift/thrift/lib/libconcurrency.so
/home/fbthrift/thrift/lib/libcompiler_lib.so
/home/fbthrift/thrift/lib/libcompiler_ast.so
/home/fbthrift/thrift/lib/libtransport.so
/home/fbthrift/thrift/lib/libthriftcpp2.so
/home/fbthrift/thrift/lib/libcompiler_generators.so
/home/fbthrift/thrift/lib/libthrift-core.so
/home/fbthrift/thrift/lib/libmustache_lib.so
/home/fbthrift/thrift/lib/libthriftprotocol.so
/home/fbthrift/thrift/lib/libprotocol.so
/home/fbthrift/thrift/lib/libthriftfrozen2.so
/home/fbthrift/thrift/lib/libcompiler_generate_templates.so
/home/fbthrift/thrift/lib/libasync.so
/home/fbthrift/thrift/lib/librpcmetadata.so
/home/fbthrift/thrift/lib/libcompiler_base.so
/home/fbthrift/thrift/lib/libthriftmetadata.so
/home/mvfst/build/quic/state/libmvfst_state_qpr_functions.so
/home/mvfst/build/quic/state/libmvfst_state_ack_handler.so
/home/mvfst/build/quic/state/libmvfst_state_pacing_functions.so
/home/mvfst/build/quic/state/libmvfst_state_simple_frame_functions.so
/home/mvfst/build/quic/state/libmvfst_state_stream_functions.so
/home/mvfst/build/quic/state/libmvfst_state_functions.so
/home/mvfst/build/quic/state/libmvfst_state_machine.so
/home/mvfst/build/quic/libmvfst_exception.so
/home/mvfst/build/quic/libmvfst_constants.so
/home/fizz/fizz/build/lib/libfizz_test_support.so
/home/folly/_build/libfolly.so
/home/folly/_build/folly/logging/example/liblogging_example_lib.so
/home/folly/_build/folly/libfollybenchmark.so
/home/folly/_build/libfolly_test_util.so

I am very experienced with docker and capable of optimizing the build, but unfortunately my c++ knowledge is slowing me down 😄

@JanMikes
Copy link
Author

After removing files via

find ./bthrift/ -type f ! -name '*.so*' -delete
find ./fizz/ -type f ! -name '*.so*' -delete
find ./fmt/ -type f ! -name '*.so*' -delete
find ./folly/ -type f ! -name '*.so*' -delete
find ./googletest/ -type f ! -name '*.so*' -delete
find ./install/ -type f ! -name '*.so*' -delete
find ./libsodium/ -type f ! -name '*.so*' -delete
find ./mvfst/ -type f ! -name '*.so*' -delete
find ./proxygen/ -type f ! -name '*.so*' -delete
find ./wangle/ -type f ! -name '*.so*' -delete
find ./zstd/ -type f ! -name '*.so*' -delete

It still works. Now i need to know what can be deleted from distro (probably source codes as well, logs etc., i rather what should be kept instead of what should be deleted is easier approach).

Next step will be other system thins and dependenciesm not sure which are those yet.

What difference will make running ./cmake/run-cmake.sh Release instead of Debug parameter?

@JanMikes
Copy link
Author

JanMikes commented Sep 23, 2020

FYI, so far after removing for non-bistro everything except .so results:

Before:

nobody@66b52e09ab45:/home$ du -hs .
4.1G	.

After:

nobody@66b52e09ab45:/home$ du -hs .
2.6G	.

Not bad for just a start.

@snarkmaster
Copy link
Contributor

no /ccache

If there's no /ccache, it sounds like you're not running with that enabled. You would see the output of this in the logs of the program that prepares your Dockerfile:

            logging.info('Docker ccache not enabled')

This is fine, /ccache is most helpful for incremental development (change and rebuild).

what can be deleted

All build artifacts get installed in /home/install by default:

https://github.com/facebook/bistro/blob/4add83f0004325f4d7092dbe3c25eb2acc559733/build/fbcode_builder/make_docker_context.py#L62

So you should not need any of the build trees at all, just /home/install and a barebones OS.

For the OS, you could do things like apt-get remove gcc. The full set of deps we install on top of the base Ubuntu image is here:

https://github.com/facebook/bistro/blob/4add83f0004325f4d7092dbe3c25eb2acc559733/build/fbcode_builder/fbcode_builder.py#L182

An alternative approach is to find the smallest base OS of the same Ubuntu release that you can get, copy over /home/install, and then to install the missing dependencies (you'll get missing .so errors for each one).

Either way, it'd be a bit of a trial and error, I've never the time to separate the runtime dependencies from the build-time dependencies for the OSS build.

If you find time to upstream your work, that would be lovely. If not, maybe at least share a gist of your process on this issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants