Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972

Open
wants to merge 73 commits into
base: develop
Choose a base branch
from

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Oct 28, 2024

What this PR does / why we need it:

This PR includes a mechanism for registering new users in Dataverse through a Bearer Token.

BearerTokenAuthMechanism

The Bearer Token authentication functionality (available through feature flag) has been extended to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token. See the current development version of BearerTokenAuthMechanism. (https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/auth/BearerTokenAuthMechanism.java#L58).

The BearerTokenAuthMechanism has been extended to return a meaningful error in the scenario described above. Specifically, the calling user would receive a 403 error, indicating that the token is validated but the user is not registered in Dataverse.

Additionally, the entire BearerTokenAuthMechanism class has been refactored, moving authentication and token verification logic to AuthService, since this logic is also utilized by the new user registration functionality included in this PR, and it also results in a simpler version of BearerTokenAuthMechanism, more testable and with more limited responsability.

User registration endpoint

The new endpoint takes the bearer token and a UserDTO object with the required properties to create a user account in Dataverse. It only works if the bearer token feature flag is enabled. A RegisterOidcUserCommand handles the validation of the bearer token and user data, and the account creation process. It can return various errors, including a new InvalidFieldsCommandException, which lists any validation errors in the UserDTO. This exception is now handled by AbstractApiBean like other command exceptions and can be used for future purposes.

Which issue(s) this PR closes:

Special notes for your reviewer:

  • See Handle unregistered users in BearerTokenAuthMechanism and implement user registration mechanism #10972 (comment) to understand why the OIDC UserInfo endpoint is not used in this design

  • The IT for the new registration endpoint is disabled because it depends on the containerized environment, which is not available in Jenkins. Additionally, the feature flag would not be active, so it would fail for this reason as well. Changes have been made to the Keycloak realm JSON to suit the needs of this IT, particularly the modification of roles in the test realm to allow an admin user to create a random test user in the IT setup.

Suggestions on how to test this:

Visual inspection of the API integration tests. Overall, this PR is extensively tested with unit and integration tests, but it is in the ITs where we can see how the endpoint calls work and the different responses that can be obtained.

We can also run this branch locally and reproduce the same calls made in the ITs using curl.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Yes, attached.

Additional documentation:

None

This comment has been minimized.

This comment has been minimized.

…earer token auth feature flags compatibility
@g-saracca
Copy link

I think we should wait for this one, while doing the SPA integration and forcing to send username when the OIDC token has the preferred_username present the endpoint should fail and is not failing.

This was sort of a mistake, there was a typo in the docs, the dataverse flag is not api-bearer-auth-json-claims but api-bearer-auth-provide-missing-claims

This comment has been minimized.

@cmbz cmbz added the FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) label Dec 5, 2024
@GPortas GPortas self-assigned this Dec 9, 2024

This comment has been minimized.

@GPortas
Copy link
Contributor Author

GPortas commented Dec 10, 2024

@johannes-darms @qqmyers

Ok, I see two distinct topics here. First, we have user consent, which is configured independently from the Terms of Service (ToS) in the client settings. It controls whether a user is prompted to approve data sharing and permissions with a specific client during login.

On the other hand, the 'Fine Grain OpenID Connect Configuration > Terms of Service' option allows you to attach the Relying Party's (in this case, Dataverse) ToS URL in the OIDC metadata.

Just to clarify, it is not valid to assume that accepting the consent prompted by 'Login Settings: Consent Required' automatically means the user has accepted the Terms of Service (ToS). While the two concepts may be related, they serve distinct purposes and have different legal and technical implications. I'm not entirely sure if we're on the same page here, so I wanted to mention this to avoid misunderstandings. Sometimes, with so many concepts involved, it's difficult to clarify everything through GitHub messages. I’d also be happy to discuss it on Zoom if that’s easier.

To address user consent acceptance, we could create a specific feature flag in Dataverse for Keycloak providers that checks whether the consent has been accepted in the provider before accessing the user’s information. This could be done using the Keycloak API.

On the other hand, Keycloak doesn’t have a built-in feature to automatically handle ToS acceptance as part of the standard consent flow for OIDC. The Terms of Service URL configured under Fine Grain OpenID Connect Configuration is part of the OIDC discovery document metadata and is not shown in the Keycloak UI. It is meant to be referenced by OIDC clients, and if you want users to accept the ToS explicitly, you would need to handle that separately.

One potential workaround could be to include the ToS link in the client consent text, which is configurable. However, I believe this might be too much of a workaround and could lead to misunderstandings. Another option could be to create a custom flag (e.g., tosAccepted=true) stored as a user attribute. But the question is: Isn’t it easier to manage the ToS from Dataverse, which is under our control, than to wait for Keycloak IdPs to apply ad-hoc configurations for ToS acceptance?

@qqmyers
Copy link
Member

qqmyers commented Dec 10, 2024

+1 for a flag. It is easy for someone to configure Keycloak as in
image
. Whether that's better than having an ad-hoc screen somewhere in a Dataverse app, or hoping that some tool using the api actually shows the user instead of hardcoding a yes in the json seems like something to leave up to individual installations. Leaving the flag false (allowing a yes in Json) seems like a good default as you can make sure the terms are shown in the SPA and that default and the SPA will work for places that don't use Keycloak/don't have control over the OIDC screen.

@johannes-darms
Copy link
Contributor

@johannes-darms @qqmyers

Ok, I see two distinct topics here. First, we have user consent, which is configured independently from the Terms of Service (ToS) in the client settings. It controls whether a user is prompted to approve data sharing and permissions with a specific client during login.

Those topics are depending on the jursitication and legal team involved different, overlapping or the same concepts. But in most cases its easier to ask once for permission than many times. Thus, the combination of both* into one conceptional technical step. Sorry, that I wasn't clear about this.
*(For the shake of completeness there is a third document, the privacy policy, which must also be shown and accepted by the user/)

On the other hand, the 'Fine Grain OpenID Connect Configuration > Terms of Service' option allows you to attach the Relying Party's (in this case, Dataverse) ToS URL in the OIDC metadata.

Yes, this is part of the protocol definition of OIDC:

policy_uri
    OPTIONAL. URL that the Relying Party Client provides to the End-User to read about how the profile data will be used. The value of this field MUST point to a valid web page. The OpenID Provider SHOULD display this URL to the End-User if it is given. If desired, representation of this Claim in different languages and scripts is represented as described in [Section 2.1](https://openid.net/specs/openid-connect-registration-1_0.html#LanguagesAndScripts). 
tos_uri
    OPTIONAL. URL that the Relying Party Client provides to the End-User to read about the Relying Party's terms of service. The value of this field MUST point to a valid web page. The OpenID Provider SHOULD display this URL to the End-User if it is given. If desired, representation of this Claim in different languages and scripts is represented as described in [Section 2.1](https://openid.net/specs/openid-connect-registration-1_0.html#LanguagesAndScripts).

Just to clarify, it is not valid to assume that accepting the consent prompted by 'Login Settings: Consent Required' automatically means the user has accepted the Terms of Service (ToS). While the two concepts may be related, they serve distinct purposes and have different legal and technical implications. I'm not entirely sure if we're on the same page here, so I wanted to mention this to avoid misunderstandings. Sometimes, with so many concepts involved, it's difficult to clarify everything through GitHub messages. I’d also be happy to discuss it on Zoom if that’s easier.

I think we are, but we can still chat in zulip later..

To address user consent acceptance, we could create a specific feature flag in Dataverse for Keycloak providers that checks whether the consent has been accepted in the provider before accessing the user’s information. This could be done using the Keycloak API.

First, if you do implement it, please do not make it keylock-specific. Second, you do not need to store this information in the resource server (dataverse). If the permission is given, you get the data, otherwise you cannot get the information because the permission server does not give out user data. The protocol even indicates why no information is transmitted. See here:

consent
    The Authorization Server SHOULD prompt the End-User for consent before returning information to the Client. If it cannot obtain consent, it MUST return an error, typically consent_required. 

On the other hand, Keycloak doesn’t have a built-in feature to automatically handle ToS acceptance as part of the standard consent flow for OIDC. The Terms of Service URL configured under Fine Grain OpenID Connect Configuration is part of the OIDC discovery document metadata and is not shown in the Keycloak UI. It is meant to be referenced by OIDC clients, and if you want users to accept the ToS explicitly, you would need to handle that separately.

Yes, this is a fundamental issue with OIDC. The tos_uri and policy_uri properties are optional during client registration. Unfortunately, the protocol only defines The OpenID Provider SHOULD display this URL to the End-User if it is given., so we can't really control if and how this information is displayed and handled by the authentication server. There are many different solutions, but they are all vendor specific and all out of our control (i.e. the entity running and configuring an OpenID provider/authentication server can change the behaviour, but we as the resource server (developers/operators) cannot).

What can we do about it? We as resource servers (developers/operators) want a (legally binding) property that tells us that the user has read, understood and accepted the required terms/policies, right? This means that every user has to set this flag. When we create and control a user interface, we can ensure that the required information is available, displayed to a user, scrolled through and actively accepted by a click. Once we no longer control the visualisation (i.e. a user-facing API sets the property), we cannot infer that the user has seen, interacted with or accepted anything. All we have is a property claiming that this happened. Whether this is sufficient is a question for a legal team and will vary from jurisdiction to jurisdiction. (For our installation it is not!) The ODIC protocol with promtpt=consent ensures that at least a screen is shown and a consent button is pressed before any user identifiable information is transferred before an account can be created. Whether this is sufficient (to also count as accepting the ToS) also depends on what the screen looks like, what information is displayed, and what the legal team thinks about it, as said above, we as resource servers (developers/operators) cannot control this.

So how do we solve this problem? Let the user configure what they need (i.e. what their legal team requires) and set reasonable defaults.

First, allow promtpt=consent to be used during account creation (possibly configurable via a flag). This will ensure that at least every user is happy with the transfer of their personally identifiable information. Second, implicitly set the ToS as accepted during account creation if promtpt=consent is used (configurable via a flag). This allows the problem to be delegated to the ODIC service and avoids the problem within dataverse.
If promtpt=consent or implicit ToS acceptance is not enabled via feature flags, requires an explicit property to set ToS acceptance.

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@GPortas GPortas requested a review from qqmyers December 12, 2024 12:26

This comment has been minimized.

1 similar comment
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10959-bearer-token-auth-ext
ghcr.io/gdcc/configbaker:10959-bearer-token-auth-ext

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@GPortas GPortas assigned qqmyers and unassigned GPortas Dec 12, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - looks good. I suggested some rewording in the docs for the API call - feel free to accept or edit further - I was basically trying to make the default path come first.

@ofahimIQSS ofahimIQSS self-assigned this Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA.Q4.4 OIDC login + API authentication SPA These changes are required for the Dataverse SPA User Role: API User Makes use of APIs
Projects
Status: QA ✅
Development

Successfully merging this pull request may close these issues.

Handle unregistered users in BearerTokenAuthMechanism
7 participants