Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent mapping explosion on logs #4181

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Dec 5, 2024

What is the problem this PR solves?

This commit remove some JSON objects that were added to the logs, thus preventing mapping explosion when those logs are ingested into Elasticsearch.

How does this PR solve the problem?

Some entries are converted to strings and others just removed to keep the log within a reasonable size.

How to test this PR locally

Run Fleet-Server, watch the logs

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool

## Related issues

This commit remove some JSON objects that were added to the logs, thus
preventing mapping explosion when those logs are ingested into
Elasticsearch.

Some entries are converted to strings and others just removed to keep
the log within a reasonable size.
@belimawr belimawr self-assigned this Dec 5, 2024
@belimawr belimawr requested a review from a team as a code owner December 5, 2024 21:47
Copy link
Contributor

mergify bot commented Dec 5, 2024

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 5, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 5, 2024
case !reflect.DeepEqual(curCfg.Inputs[0].Server, newCfg.Inputs[0].Server):
zlog.Info().
Interface("old", curCfg.Redact()).
Copy link
Contributor

@michel-laterman michel-laterman Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log the new/old config as strings on the debug level instead? (but keep the info level messages when something has changed)

@belimawr
Copy link
Contributor Author

belimawr commented Dec 6, 2024

CI is failing because of a known flaky test, #4170 that is addressed by #4171.

@pkoutsovasilis
Copy link

CI is failing because of a known flaky test, #4170 that is addressed by #4171.

Hey @belimawr 👋 I see that this #4171 is merged. Could you please merge the latest main in this PR to get a ✅ CI run? 🙂

@belimawr
Copy link
Contributor Author

belimawr commented Dec 9, 2024

CI is failing because of a known flaky test, #4170 that is addressed by #4171.

Hey @belimawr 👋 I see that this #4171 is merged. Could you please merge the latest main in this PR to get a ✅ CI run? 🙂

Done! Let's wait for CI.

@cmacknz cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 9, 2024
Msg("local components data is not equal")

zlog.Info().
RawJSON("req.Components", *req.Components).
Str("req.Components", string(reqComponentsJSON)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for us to log the components JSON for every components change? Successful checkins will allow it to be queried from the .fleet-agents index at any time. This seems like something that would only be valuable to log whe there is an error.

At minimum this could be moved to the debug level, it doesn't make sense to log this at info by the unhealthy_reason at debug below, the unhealthy reason is much more interesting, but again it can be queried from .fleet-agents at any time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for us to log the components JSON for every components change?

I'm not sure, honestly, I was just focusing on the mapping explosion. Someone from @elastic/fleet might be able to answer that.

The lines before already log the old and new components at trace level, so just removing them from this entry seems to be the best option, which will also avoid logging the same thing multiple times.

@cmacknz cmacknz added backport-7.17 Automated backport to the 7.17 branch with mergify backport-8.16 Automated backport with mergify and removed backport-7.17 Automated backport to the 7.17 branch with mergify labels Dec 10, 2024
@belimawr belimawr merged commit 8ff01e3 into elastic:main Dec 11, 2024
8 checks passed
mergify bot pushed a commit that referenced this pull request Dec 11, 2024
This commit remove some JSON objects that were added to the logs, thus
preventing mapping explosion when those logs are ingested into
Elasticsearch.

Some entries are converted to strings, others are fully removed to keep
the log within a reasonable size and some are kept as string at trace level.

(cherry picked from commit 8ff01e3)
mergify bot pushed a commit that referenced this pull request Dec 11, 2024
This commit remove some JSON objects that were added to the logs, thus
preventing mapping explosion when those logs are ingested into
Elasticsearch.

Some entries are converted to strings, others are fully removed to keep
the log within a reasonable size and some are kept as string at trace level.

(cherry picked from commit 8ff01e3)
ycombinator pushed a commit that referenced this pull request Dec 12, 2024
This commit remove some JSON objects that were added to the logs, thus
preventing mapping explosion when those logs are ingested into
Elasticsearch.

Some entries are converted to strings, others are fully removed to keep
the log within a reasonable size and some are kept as string at trace level.

(cherry picked from commit 8ff01e3)

Co-authored-by: Tiago Queiroz <[email protected]>
ycombinator pushed a commit that referenced this pull request Dec 12, 2024
This commit remove some JSON objects that were added to the logs, thus
preventing mapping explosion when those logs are ingested into
Elasticsearch.

Some entries are converted to strings, others are fully removed to keep
the log within a reasonable size and some are kept as string at trace level.

(cherry picked from commit 8ff01e3)

Co-authored-by: Tiago Queiroz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants