Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce build duration #1149

Closed
daniel-beck opened this issue Jul 16, 2021 · 14 comments
Closed

Reduce build duration #1149

daniel-beck opened this issue Jul 16, 2021 · 14 comments

Comments

@daniel-beck
Copy link
Member

We now have a dozen different Docker images, despite us consistently being terrible at actually maintaining them.

I understand that my position of not catering to everyone who shows up and wants their favorite niche distro supported (to externalize the costs of maintaining the image to someone else) is not one that'll gain traction here…

But could we at least reduce the build duration? The build duration when there is nothing to do is now at 20 minutes. I expect we're probably at more than an hour now for security updates, and that's completely unreasonable.

@timja
Copy link
Member

timja commented Jul 18, 2021

This seems to have absolutely nothing to do with the number of images that we publish.

We just added 2 new images and optimised the publishing time.

Publishing time is now at 7min for linux as seen in https://trusted.ci.jenkins.io:1443/blue/organizations/jenkins/Containers%2FCore%20Release%20Containers/detail/master/5872/pipeline/22/, and 6 seconds when the image is already published.

There's something wrong in the windows scripts which means windows builds are building new images every time the script is run, the build time you're looking at is all the windows build time. I'm not sure if there's much we can do about that?

Do you consider 7 minutes too long or should this issue be re-scoped to be looking at the windows images / scripts?

cc @slide

@slide
Copy link
Member

slide commented Jul 18, 2021

Testing takes a long time on windows, as does the cleanup step at the end. We could reduce the tests or look at why cleanup takes so long

@timja
Copy link
Member

timja commented Jul 18, 2021

Master build doesn’t run the tests so that shouldn’t matter?

@timja
Copy link
Member

timja commented Jul 18, 2021

I've enabled timestamper so we can see where it's taking so long.

Is it because we build two images one after another?

  1. Do we need 2?
  2. We can switch it to parallel by moving it to using the bake file

@timja
Copy link
Member

timja commented Jul 18, 2021

#1150 should make it go somewhat faster.

Script needs looking at to see why it runs every time unlike the others

@daniel-beck
Copy link
Member Author

daniel-beck commented Jul 18, 2021

Do you consider 7 minutes too long or should this issue be re-scoped to be looking at the windows images / scripts?

7 minutes are great, assuming that's the time for all the things happening ❤️ (I usually care about the worst case, which is LTS + weekly at the same time).

(Honestly 20 minutes would be great already for all the things; but we've been at 50 minutes before. Unfortunately I didn't check the June 30 build before it got rotated due to other stuff going on at the same time.)

@timja
Copy link
Member

timja commented Jul 19, 2021

Removing one Windows image took it down to 12 minutes.

Enabling timestamper has shown that it takes 10minutes to pull the windows upstream base image and 2minutes to build our image and push it.

@slide is this some windows docker known issue or is the upstream image just huge (it's 2.57GB)?

It may be this issue: moby/moby#39832

@daniel-beck
Copy link
Member Author

Removing one Windows image took it down to 12 minutes.

If that's the upper bound even for security releases, this issue can be closed.

@timja timja closed this as completed Jul 19, 2021
@slide
Copy link
Member

slide commented Jul 19, 2021

The 1809 images are definitely the biggest, MS had reduced the size of need images, though the server core images are still over a gig usually.

@daniel-beck daniel-beck reopened this Nov 4, 2021
@daniel-beck
Copy link
Member Author

daniel-beck commented Nov 4, 2021

Took 33 minutes today 😭 Up from 20 minutes in Jan 2020.

@dduportal
Copy link
Contributor

Some facts to help:

  • On both "build" and "publish" steps are:
    • Using parallel Jenkins Pipeline stages for Windows and Linux (e.g 2 parallel stages)
    • Using docker buildx bake for the Linux, which mean that all the Docker Linux Images are built in parallel on the same machine
  • The (linux) builds on ci.jenkins.io are executed on VMs that have 4vCPU and 16 Gb, while the "publish" step on trusted.ci.jenkins was run on a 2vCPU / 8 Gb
    • There is a resource contention during the "publish" steps, which explain the 33 min of today's release.
    • I can reproduce this behavior on a Docker Desktop installation: I got slightly the same build time (35 min when Docker Desktop only has 2/8 and around 15 min with 4/16).

2 actions for the "publish" build time

  • Use the same size as ci.jenkins.io to ensure that there is NO time difference (it's really annoying for the release/security team to suffer from a slow builds while the other "build often" part takes less than 12 min)
  • Think about paralellizing at pipeline level (e.g. 1 image or small group of image per agent)

@timja
Copy link
Member

timja commented Nov 4, 2021

the fix is really image staging, security team shouldn't care that the build is slow.

Jenkins core takes ~2->2.5 hours to release, while yes this was slow and good ideas for improvement thanks Damien :), it's not long unless you are watching it.

I've created a placeholder issue for it #1228

please feel free to add requirements, design or suggestions to it (or implementation :D )

@daniel-beck
Copy link
Member Author

Jenkins core takes ~2->2.5 hours to release

Yep, we do this a few days earlier and I don't care.

Packages are painful at 17-20 minutes each too, and I've requested staging there forever 😢

So, yes, staging here would be great. We have Docker repos on repo.jenkins-ci.org which can probably support that.

@dduportal
Copy link
Contributor

The average build time, as for today, are:

  • ~11 minute (reported on trusted.ci.jenkins.io) with huge machines, and ~27 min on ci.jenkins.io with light machines.

Closing this issue as the next steps are staging releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants