Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change license element to support CURIEs, e.g., for SPDX #399

Open
cthoyt opened this issue Nov 4, 2024 · 7 comments
Open

Change license element to support CURIEs, e.g., for SPDX #399

cthoyt opened this issue Nov 4, 2024 · 7 comments
Labels
enhancement New feature or request schema

Comments

@cthoyt
Copy link
Member

cthoyt commented Nov 4, 2024

The license element of the schema is currently a URI. It would be nice to be able to specify licenses based on SPDX identifiers as well, since this has the benefit of being a controlled vocabulary, and pushing people towards using well-defined licenses

@cthoyt cthoyt added enhancement New feature or request schema labels Nov 4, 2024
@gouttegd
Copy link
Contributor

gouttegd commented Nov 22, 2024

In principle no objection to that, but it should be noted that, with the current implementation in SSSOM-Py, the “as well” bit is not possible: either the slot expects a URI, or it accepts a CURIE, but it cannot accept both: SSSOM-Py enforces that any slot typed as a EntityReference MUST be in CURIE form.

So if we change the type of license to be an EntityReference, SSSOM-Py will no longer accept the use of plain IRI, and will barf on a mapping set with

#license: https://creativecommons.org/licenses/by/4.0

(which is currently perfectly valid). This will have to be replaced by a pseudo-CURIE, as in

#curie_map:
#  CCORG: https://creativecommons.org/licenses/
#license: CCORG:by/4.0

I’m very reluctant to break existing mapping sets just for that, so I will oppose any such change until/unless SSSOM-Py is made to silently accept the use of IRIs for EntityReference-typed slots (at least for this particular slot).

@matentzn
Copy link
Collaborator

Has anyone ever tried to validate

#curie_map:
#  CC.ORG: "CC.ORG:"
#license: CC.ORG:by/4.0

Just out of curiosity for a hacky workaround for some cases. In general if we type something "uriorcurie" its is my understanding it should literally be possible to either or, even if tools don't support it yet.

All that said,

is this really that bad?

#curie_map:
#  SPDX: https://spdx.org/licenses/
#license: SPDX:AdaCore-doc

TBH SPDX may not be as strong a case for allowing curies explicitly, even if the RDF transformation of the above adds https://spdx.org/licenses/AdaCore-doc as the license..

@gouttegd
Copy link
Contributor

gouttegd commented Nov 22, 2024

Has anyone ever tried to validate

#curie_map:
#  CC.ORG: "CC.ORG:"
#license: CC.ORG:by/4.0

Err, yes, this works, but why would ever do that?

If you really want to store the license as a CURIE, you can do so the “normal” way, without having to make this weird “prefix” declaration:

#curie_map:
#  CCORG: https://creativecommons.org/licenses/
#license: CCORG:by/4.0

This already works perfectly fine, both with SSSOM-Py and with SSSOM-Java. It’s just that, officially, this is not a CURIE, this is string that happens to look like a CURIE (which means, for example, that SSSOM-Java will not expand it as it normally does for all other EntityReference-typed slots). The requested change here is to make this slot accept CURIEfied values officially, rather than as something that is silently accepted by implementations.

In general if we type something "uriorcurie" its is my understanding it should literally be possible to either or

This is what SSSOM-Java does. This is also what SSSOM-Py used to do, but now it only accepts CURIEs. Any attempt to validate a set containing a IRI where an EntityReference is expected results in:

WARNING:sssom.util:Slot 'license' has an incorrect value: https://creativecommons.org/licenses/by/4.0/

(here, after I changed the type of license from URI to EntityReference, as requested).

My understanding is that this change, the fact that SSSOM-Py enforces that all values of EntityReference-typed slots MUST be in CURIE form, was a deliberate decision (in fact this is because of that change that, when re-writing the specification, I baked in the requirement that all identifiers in a SSSOM/TSV file had to be in CURIE form only). Was that not the case?

@matentzn
Copy link
Collaborator

matentzn commented Nov 22, 2024

My understanding is that this change, the fact that SSSOM-Py enforces that all values of EntityReference-typed slots MUST be in CURIE form, was a deliberate decision (in fact this is because of that change that, when re-writing the specification, I baked in the requirement that all identifiers in a SSSOM/TSV file had to be in CURIE form only). Was that not the case?

I believe you are right!

And yes, hence my comment about SPDX. Not sure this case merrits drilling into the data model. @cthoyt whats your opinion, this is not good enough?

curie_map:
  SPDX: https://spdx.org/licenses/
license: SPDX:AdaCore-doc

EDIT: FORGET THIS COMMENT, ITS ALL WRONG, SEE BELOW.

@gouttegd
Copy link
Contributor

gouttegd commented Nov 22, 2024

Not sure this case merrits drilling into the data model.

The problem is not so much changing the data model per se (as I have said, I would have no objection to making that slot a EntityReference), it’s more that it is a breaking change, because existing sets that are currently using a plain IRI in their license field would suddenly fail to validate with SSSOM-Py after this change is introduced.

I am not opposed to changing the model, but now that we are post-1.0 any change must be carefully weighted, especially breaking changes. From what I see, that particular change is clearly not worth breaking existing sets, which is why I will only accept it if SSSOM-Py is made to be more tolerant on its “EntityReferences MUST be CURIEs!” rule.

As to whether the change is worth updating SSSOM-Py to relax that rule, I have no opinion.

@matentzn
Copy link
Collaborator

Ahhhh oops sorry, I totally messed up here. I thought the problem was something else :D

Basically for this to be a non breaking change, we would literally have to implement a union data type curie or uri so that we can have both CURIE values and URI values in the field.

The real question was if @cthoyt's wish to do

curie_map:
  SPDX: https://spdx.org/licenses/
license: SPDX:AdaCore-doc

is a high enough marginal gain over the current solution

license: https://spdx.org/licenses/AdaCore-doc

to justify changing the tool.. the answer is of course that we should be faithful to the docs, and here the big question is what the linkml:UriOrCurie datatype truly is..

Sorry for misleading the discussion I corrected my comment above

@gouttegd
Copy link
Contributor

gouttegd commented Nov 22, 2024

here the big question is what the linkml:UriOrCurie datatype truly is..

For what it’s worth: I believe the mere existence of that type is symptomatic of LinkML’s inability to clearly separate a data model from its possible serialisation formats.

That type only exists because identifiers can have different forms depending on the serialisations (CURIEs in some formats, IRIs in others), and data models defined in LinkML need to accommodate those different forms.

Ideally, there would be no need for such a type. An identifier (EntityReference, in SSSOM) could be just a xsd:string, and any details or constraints about how identifiers should be represented in a given serialisation format should be left to the specification of that format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request schema
Projects
None yet
Development

No branches or pull requests

3 participants