Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default allocator wording #649

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

etomzak
Copy link
Contributor

@etomzak etomzak commented Oct 24, 2024

The way this sentence is worded could be interpreted as saying that the default allocator must be monotonic --- i.e., even if a memory region is deallocated, the default allocator is not allowed to allocate it a second time.

I've taken a crack at improving the wording, but I'm unsure about the use of exclusive. The original wording seems to go out of its way to say that any memory address controlled by the allocator must belong to no more than one allocation, but I'm not sure this is necessary. Do we really have to tell people that an allocator must not make the same storage available to two different objects at the same time? If the spec does need to explain this in detail, the word exclusive is probably not a sufficiently complete explanation of the requirements.

@etomzak etomzak force-pushed the default_alloc_wording branch from 5c35b55 to 787d23b Compare October 24, 2024 14:01
For successful allocations of size greater than zero, the default allocator must
return a pointer to a contiguous, exclusive region of memory of at least the
requested size.
For unsuccessful allocations, the default allocator must return [code]#nullptr#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead removing this wording? The table below already says that the default allocator meets the C++ requirement for Allocator, and C++ defines the behavior of Allocator quite precisely.

If you ask me, the most confusing part of this section is this statement:

The user can implement an allocator that returns the same address as the one passed in the buffer constructor, but it is the responsibility of the user to handle the potential race conditions.

It would be good to figure out what we meant here and clarify this. Or, maybe it should be deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My best guess is that it's talking about the buffer constructor overloads that accept both a host pointer and an allocator:

  buffer(const T* hostData, const range<Dimensions>& bufferRange,
         AllocatorT allocator, const property_list& propList = {});

The buffer owns hostData during its lifetime, so I guess it might be safe sometimes for allocator to return hostData (or other pointers within the range). But I don't know why that's useful, and I don't know how the user can avoid race conditions. Maybe use_mutex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it might be safe sometimes for allocator to return hostData (or other pointers within the range)

Yes, that's my best guess too. However, this seems to assume that buffer will only call allocator in order to create a host buffer that contains hostData, and it won't call allocator for any other purpose. This is not specified anywhere, and it's contrary to the way the standard C++ library functions use allocators. When a class from the standard C++ library takes an allocator, I think it uses the allocator to allocate all the dynamic memory that it needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead removing this wording?

I'd also prefer to just remove the sentence and not replace it with anything.

If you ask me, the most confusing part of this section is this statement:

I interpreted that as saying that a user could use an allocator for the user's own purposes (whatever those may be) and that same allocator could be passed to a buffer constructor. I.e., the act of giving an allocator to a buffer doesn't disallow the user from using the same allocator. Given that the SYCL runtime might choose to allocate memory using the allocator asynchronously, the user must put some locking mechanism into the allocator to avoid races.

Which seems reasonable to me, but I don't know if that's what was intended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how I interpret that sentence, but I think we all agree that the sentence is confusing. Personally, I'd be happy to delete it.

The issue about thread safety is interesting. Is it legal for the implementation to call the user's allocator simultaneously from different threads even if the application has only one thread? (Or only one thread calling SYCL APIs?) For example, could an implementation have a background thread that potentially calls the allocator at the same time as the application's main thread?

If we think this is a valid implementation, the specification should make it clear that the allocator must be written to be thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the C++ spec, and I couldn't find anything on allocator thread safety one way or another. I could well have missed something, though.

I'm not convinced std::allocator is intended to be thread-safe. It's specified as stateless, and I'd hazard a guess that any mutex or such inside the allocator could qualify as state. Also, std::allocators get rebound inside the standard library (and elsewhere), which implies constructing different allocator objects. A mutex data member wouldn't provide thread-safety. If std::allocator does have a thread-safety guarantee, I'd expect it to be provided by whatever plumbing std::allocator relies on, which could be implementation-defined.

For SYCL, I think there are two ways of looking at it:

  1. "SYCL is thread-safe so that the user doesn't need to worry about thread-safety. Therefore, the user shouldn't need to worry about providing a thread-safe allocator; it's the SYCL runtime that must provide the thread-safety."
  2. "SYCL is thread-safe. When a SYCL runtime relies on an entity provided by the user, the user-provided entity must be thread-safe so that the SYCL runtime can be thread-safe."

Both of these seem entirely reasonable to me, but only one is consistent with C++. Problem is, we don't know which one it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the C++ spec, and I couldn't find anything on allocator thread safety one way or another. I could well have missed something, though.

It doesn't say "thread-safe", but allocator.members says:

Except for the destructor, member functions of the default allocator shall not introduce data races ([intro.multithread]) as a result of concurrent calls to those member functions from different threads. Calls to these functions that allocate or deallocate a particular unit of storage shall occur in a single total order, and each such deallocation call shall happen before the next allocation (if any) in this order.

For SYCL, I think there are two ways of looking at it:

I agree that both of these are reasonable interpretations. I have a preference for 2.

Copy link
Collaborator

@nliber nliber Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think std::allocator is guaranteed thread-safe (at least as far as acquiring/releasing memory) because it is specified to call ::operator new which in turn is specified to call malloc or aligned_malloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a preference for 2.

I agree. With option 1, a SYCL runtime needs to conservatively assume that any user-supplied allocator could race with any other, which hurts performance. Also, more similar to C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, option 2 seems reasonable too.

For successful allocations of size greater than zero, the default allocator must
return a pointer to a contiguous, exclusive region of memory of at least the
requested size.
For unsuccessful allocations, the default allocator must return [code]#nullptr#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unsuccessful allocations, the default allocator must return [code]#nullptr#.

I think this is not the way allocators normally handle out-of-memory. A C++ allocator normally throws an exception in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that named requirement: Allocator implies that the only alternative an allocator has to providing the requested memory is throwing.

A few lines earlier in the SYCL spec (L 4892), it says:

When an allocator returns a `nullptr`, the runtime cannot allocate data on the host.

So I guess either the SYCL spec expect more C-like behavior (in which case default allocators are not actually named requirement Allocators), or default allocators throw on failure, in which case an allocator will actually throw when the runtime can't allocate data on the host. So one sentence or the other needs to change in the SYCL spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SYCL spec should not invent some new way for C++ allocators to work. I think the previous statement about returning nullptr was just a mistake.

@etomzak
Copy link
Contributor Author

etomzak commented Oct 30, 2024

I think we're nearing consensus on the required changes:

  1. The sentence I originally suggested changing just needs to go away.
  2. The sentence that talks about an allocator returning a nullptr needs to be changed to talk about throwing.
  3. Thread-safety requirement of user-supplied allocators needs to be made explicit.

1 and 2 make sense in this PR. I'm not sure if 3 should be done here or in a separate PR, because it might imply more changes and need more careful review.

@gmlueck
Copy link
Contributor

gmlueck commented Oct 30, 2024

2. The sentence that talks about an allocator returning a nullptr needs to be changed to talk about throwing.

Can't we just delete this sentence too? The C++ specification for Allocator should cover this, no?

@etomzak
Copy link
Contributor Author

etomzak commented Oct 30, 2024

  1. The sentence that talks about an allocator returning a nullptr needs to be changed to talk about throwing.

Can't we just delete this sentence too? The C++ specification for Allocator should cover this, no?

I don't have a strong opinion, but given how long it's taken us to get to the bottom of this, I'd lean toward mentioning it in the SYCL spec for the avoidance of doubt. It's an important requirement for both users and implementers: The user's allocator must throw on failure. The implementer must catch exceptions from allocators and report them in an appropriate asynchronous exception list.

I'm now wondering about the CTS and what existing implementations do ...

@gmlueck
Copy link
Contributor

gmlueck commented Oct 30, 2024

Personally, I think it's better not to duplicate this wording from the C++ specification. Allocators are a reasonably advanced feature, and I think most programmers who define their own will understand the C++ requirements. In addition, the original spec wording didn't say anything about the out-of-memory behavior. I don't see a need to add that wording now. It's better to rely on the C++ specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants