Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of SlaveLogs #518

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Overhaul of SlaveLogs #518

wants to merge 1 commit into from

Conversation

jglick
Copy link
Member

@jglick jglick commented Mar 1, 2024

The collection of Java logging from agents is rather antiquated given the prevalence of Clouds and is missing a lot of critical diagnostics. (CloudBees-internal reference: BEE-46766) Some draft observations:

SlaveLogs only lists information from currently online agents. It might be more useful to run something like JenkinsRule.RemoteLogDumper to stream agent JUL messages back to a file on the controller, so we could collect these at any time for support bundles, including agents currently defined but disconnected, or agents which have since been deleted. Most agent logs are brief (9aeca36 notwithstanding) so this does not seem unreasonable. Could still collect remoting.log from the agent when -workDir is defined, as this would capture messages about connection attempts before the agent is fully online.

SlaveLogs.addAgentJulLogRecords does not seem to work well. It stores the logs on the agent, making it impossible to retrieve them if the agent is not currently online, begging the question of why this method even exists (since SlaveLogs also uses Computer.getLogRecords which has the same information). Also the logs are stored in the root path, rather than using the workdir when one is defined.

Even for an agent currently connected it is not that helpful since it relies on SupportPlugin.LogInitializer capturing JUL records from the agent JVM after ComputerListener.onOnline, whereas remoting/logs/remoting.log.0 inside the workdir includes all the details of the connection logic starting from hudson.remoting.jnlp.Main.createEngine up through hudson.remoting.jnlp.Main$CuiListener.status saying Connected.

SmartLogCleaner is deliberately deleting information about historical agents.

I would expect custom loggers at fine levels defined on the controller to be honored inside the agent JVM as well, as log-cli does.

NodeRemoteDirectoryComponent would include remoting.log.0 but this is not exactly apparent, and only available for one agent at a time by selecting Agent Support.

(Extracted from #517 for clarity.)

@jglick jglick added the bug label Mar 1, 2024
jglick added a commit to jglick/support-core-plugin that referenced this pull request Mar 1, 2024
jglick added a commit that referenced this pull request Mar 18, 2024
* agent-logs sketch

* `SlaveLaunchLogsTest.onlineInboundAgent`

* `SlaveLaunchLogsTest.offlineAgent`

* Javadoc

* Exploring `SlaveLaunchLogs` behavior

* More `SlaveLaunchLogsTest`

* `SlaveLaunchLogsTest.passwords`

* Worked out a better `SlaveLaunchLogs`, but depends on patch to `SlaveComputer`

* Need to flush logs also for `deletedAgent`

* Reverting changes extracted to #518

* Setting a timestamp, switching category

* RC

* No need to assert that `SupportTestUtils.invokeComponentToString` is non-null

* `Security2186Test` failure caused by renamed bundle entry

* File handle leak caught by Windows tests

* Better handling of rotated logs

* More robust way to wait for `Connection terminated` message

* `SlaveLaunchLogsTest.offlineAgent` still flaky on Windows

* jenkinsci/jenkins#9009 released

* Working around lack of JENKINS-72799 to avoid requiring a weekly core

* SpotBugs

---------

Co-authored-by: Allan Burdajewicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant