Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add registry to Filebeat's diagnostic #41795

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Nov 26, 2024

Proposed commit message

This commit adds Filebeat's registry folder to the Elastic-Agent
diagnostics. It's called registry.tar.gz and includes all registry files
on ${path.home}/registry.

The registry is first archived into a temporary tar file. The
temporary file is created by calling os.CreateTemp and will use the
OS's temporary folder. Then it's gziped in memory and returned to
Elastic-Agent, finally the temporary file is removed from the disk.

If the final gziped file is more than 20mb, it is skipped due to its large
size.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

None.

Author's Checklist

How to test this PR locally

The easiest way to test it locally is to package the Elastic-Agent with your local build of Beats by setting EXTERNAL=false when calling mage build in the Elastic-Agent folder.

Then, deploy the Elastic-Agent, let it ingest some logs and request the diagnostics. Extract the diagnostics and look for registry.tar.gz in components/filestram-* folders.

Related issues

## Use cases
## Screenshots

Logs

Full log entries

{"log.level":"debug","@timestamp":"2024-11-27T11:00:35.812-0500","log.logger":"diagnostics","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/beater.gzipRegistry","file.name":"beater/diagnostics.go","file.line":54},"message":"temporary file '/tmp/filebeat-registry-640175774.tar' created","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-11-27T11:00:35.812-0500","log.logger":"diagnostics","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/beater.tarFolder","file.name":"beater/diagnostics.go","file.line":116},"message":"starting to walk '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry'","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-11-27T11:00:35.812-0500","log.logger":"diagnostics","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/beater.tarFolder.func1","file.name":"beater/diagnostics.go","file.line":140},"message":"adding '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry/filebeat/log.json' to the tar archive","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-11-27T11:00:35.812-0500","log.logger":"diagnostics","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/beater.tarFolder.func1","file.name":"beater/diagnostics.go","file.line":140},"message":"adding '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry/filebeat/meta.json' to the tar archive","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-11-27T11:00:35.812-0500","log.logger":"diagnostics","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/beater.gzipRegistry","file.name":"beater/diagnostics.go","file.line":63},"message":"finished gziping Filebeat's registry","service.name":"filebeat","ecs.version":"1.6.0"}

Only the message field:

"temporary file '/tmp/filebeat-registry-640175774.tar' created"
"starting to walk '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry'"
"adding '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry/filebeat/log.json' to the tar archive"
"adding '/home/tiago/devel/beats/x-pack/filebeat/build/integration-tests/TestRegistryIsInDiagnostics3276586231/data/registry/filebeat/meta.json' to the tar archive"
"finished gziping Filebeat's registry"

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 26, 2024
@belimawr belimawr self-assigned this Nov 26, 2024
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 26, 2024
Copy link
Contributor

mergify bot commented Nov 26, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Nov 26, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 26, 2024
@belimawr belimawr changed the title [WIP] Add registry to Filebeat's diagnostic Add registry to Filebeat's diagnostic Nov 27, 2024
@belimawr belimawr marked this pull request as ready for review November 27, 2024 19:54
@belimawr belimawr requested a review from a team as a code owner November 27, 2024 19:54
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz
Copy link
Member

cmacknz commented Nov 27, 2024

  1. Is there a limit to how big the contents of the ${path.home}/registry directory can get? If there isn't a limit in Filebeat itself, I think the diagnostics collection here needs a size limit of some sort. Otherwise if we accidentally create a huge registry diagnostics won't work at all.
  2. Can a more restrictive pattern be used instead of just taking everything in the directory? Perhaps you can only take files with permissions that match the filebeat user and group? This is to stop arbitrary data from being uploaded to Fleet and then decompressed later by an unsuspecting user or engineer.

@belimawr
Copy link
Contributor Author

  1. Is there a limit to how big the contents of the ${path.home}/registry directory can get? If there isn't a limit in Filebeat itself, I think the diagnostics collection here needs a size limit of some sort. Otherwise if we accidentally create a huge registry diagnostics won't work at all.
  2. Can a more restrictive pattern be used instead of just taking everything in the directory? Perhaps you can only take files with permissions that match the filebeat user and group? This is to stop arbitrary data from being uploaded to Fleet and then decompressed later by an unsuspecting user or engineer.
  1. No there isn't, I've been thinking about it, I believe gRPC has a limit, so in my mind that was a kind of a guardrail. A limit for the payload (gziped "file") is easy to add. It's hard for me, at the moment, to gauge what would be an ideal limit for a high load environment like Kubernetes. The log file is caped at 10Mb, however the checkpoint is unbound. Given compression, I'd say at least 20Mb to be on the safe side.
  2. Yes, that can be implemented, all files that make up the registry are well known. The registry itself is in the data folder of Elastic-Agent, so there should be no extra data there. The only reason I can think of to have some files there we do not want is if some malicious actor adds something there. Anyways, I agree it's an nice extra security feature.

I'll implement those changes.

@cmacknz
Copy link
Member

cmacknz commented Nov 27, 2024

The log file is caped at 10Mb, however the checkpoint is unbound. Given compression, I'd say at least 20Mb to be on the safe side.

The Fleet server upload API used to get diagnostics as an action probably has a limit, I'm not sure what it is by default off the top of my head. I would expect it is ~100 MB so if we go with 20 MB we need to make sure we don't push ourselves past that in real use cases.

The few compressed gzipped diagnostics I have laying around are all <5 MB now for reference.

@cmacknz
Copy link
Member

cmacknz commented Nov 27, 2024

The Fleet settings aren't super clear on what the upload file size limit is, maybe there isn't one? The files are chunked as part of the implementation. https://github.com/elastic/fleet-server/blob/4132a505f0c759e332a890fdf701b8ff238cf91a/fleet-server.reference.yml#L200-L214

Copy link
Contributor

mergify bot commented Dec 3, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b filestream-registry-on-diagnostics upstream/filestream-registry-on-diagnostics
git merge upstream/main
git push upstream filestream-registry-on-diagnostics

Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from comments

Comment on lines 162 to 174
// func (r *runner) Diagnostics() []diagnostics.DiagnosticSetup {
// fmt.Println("================================================== Diagnostics called!")
// setup := diagnostics.DiagnosticSetup{
// Name: "registry collector",
// Description: "Collect Filebeat's registry",
// Filename: "registry.tar.gz",
// ContentType: "application/octet-stream",
// Callback: getRegistry,
// }

// return []diagnostics.DiagnosticSetup{setup}
// }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this commented out code. Correct?

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes until comments get addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Filebeat registry to the diagnostics bundle for the control V2 protocol
4 participants