Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Simplify dataset metadata JSON files for dataset creation or import #10957

Open
DS-INRAE opened this issue Oct 23, 2024 · 9 comments
Labels
Type: Feature a feature request

Comments

@DS-INRAE
Copy link
Member

Overview of the Feature Request
Remove elements from the dataset creation json file that are superfluous

What kind of user is the feature intended for?
API User

What inspired the request?
JSON files are long, complex and intimidating for new users.

What existing behavior do you want changed?
Remove the need of the following attributes in the dataset JSON files :

  • typeClass for metadata fields
  • multiple for metadata fields
  • displayName for metadatablocks

JSON files comparison
Current Darwin Finches JSON for the fields title, author, datasetContact, dsDescription, subject :

{
  "datasetVersion": {
    "license": {
      "name": "CC0 1.0",
      "uri": "http://creativecommons.org/publicdomain/zero/1.0"
    },
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Darwin's Finches",
            "typeClass": "primitive",
            "multiple": false,
            "typeName": "title"
          },
          {
            "value": [
              {
                "authorName": {
                  "value": "Finch, Fiona",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorName"
                },
                "authorAffiliation": {
                  "value": "Birds Inc.",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorAffiliation"
                }
              }
            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "author"
          },
          {
            "value": [ 
                { "datasetContactEmail" : {
                    "typeClass": "primitive",
                    "multiple": false,
                    "typeName": "datasetContactEmail",
                    "value" : "[email protected]"
                },
                "datasetContactName" : {
                    "typeClass": "primitive",
                    "multiple": false,
                    "typeName": "datasetContactName",
                    "value": "Finch, Fiona"
                }
            }],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "datasetContact"
          },
          {
            "value": [ {
               "dsDescriptionValue":{
                "value":   "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
                "multiple":false,
               "typeClass": "primitive",
               "typeName": "dsDescriptionValue"
            }}],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "dsDescription"
          },
          {
            "value": [
              "Medicine, Health and Life Sciences"
            ],
            "typeClass": "controlledVocabulary",
            "multiple": true,
            "typeName": "subject"
          }
        ],
        "displayName": "Citation Metadata"
      }
    }
  }
}

Simplified JSON file :

{
  "datasetVersion": {
    "license": {
      "name": "CC0 1.0",
      "uri": "http://creativecommons.org/publicdomain/zero/1.0"
    },
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Darwin's Finches",
            "typeName": "title"
          },
          {
            "value": [
              {
                "authorName": {
                  "value": "Finch, Fiona",
                  "typeName": "authorName"
                },
                "authorAffiliation": {
                  "value": "Birds Inc.",
                  "typeName": "authorAffiliation"
                }
              }
            ],
            "typeName": "author"
          },
          {
            "value": [ 
                { "datasetContactEmail" : {
                    "typeName": "datasetContactEmail",
                    "value" : "[email protected]"
                },
                "datasetContactName" : {
                    "typeName": "datasetContactName",
                    "value": "Finch, Fiona"
                }
            }],
            "typeName": "datasetContact"
          },
          {
            "value": [ {
               "dsDescriptionValue":{
                "value":   "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
               "typeName": "dsDescriptionValue"
            }}],
            "typeName": "dsDescription"
          },
          {
            "value": [
              "Medicine, Health and Life Sciences"
            ],
            "typeName": "subject"
          }
        ]
      }
    }
  }
}

Are you thinking about creating a pull request for this feature?
Even if this would help increase APIs adoption, we have other priorities at the moment.

@DS-INRAE DS-INRAE added the Type: Feature a feature request label Oct 23, 2024
@DS-INRAE
Copy link
Member Author

Note: a more radical simplification would be very interesting, but hopefully this would be an easier quick win.

@DS-INRAE DS-INRAE moved this to 🔍 Interest in Recherche Data Gouv Oct 23, 2024
@qqmyers
Copy link
Member

qqmyers commented Oct 23, 2024

Note that the metadata input for the semantic API would look like (using a (~standard) @context for readability):

{
  "title":"Darwin's Finches",
  "author": {
    "citation:authorName": "Finch, Fiona",
    "citation:authorAffiliation": "Bird's Inc."
  },   
  "citation:datasetContact": {
    "citation:datasetContactName": "Finch, Fiona",
    "citation:datasetContactEmail": "[email protected]"
  },
  "citation:dsDescription": {
    "citation:dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
  },
  "subject": "Medicine, Health and Life Sciences",
  "@context": {
    "author": "http://purl.org/dc/terms/creator",
    "citation": "https://dataverse.org/schema/citation/",
    "subject": "http://purl.org/dc/terms/subject",
    "termName": "https://schema.org/name",
    "title": "http://purl.org/dc/terms/title"
  }
}

or, even shorter,

{
  "http://purl.org/dc/terms/title":"Darwin's Finches",
  "http://purl.org/dc/terms/creator": {
    "https://dataverse.org/schema/citation/authorName": "Finch, Fiona",
    "https://dataverse.org/schema/citation/authorAffiliation": "Bird's Inc."
  },   
  "https://dataverse.org/schema/citation/datasetContact": {
    "https://dataverse.org/schema/citation/datasetContactName": "Finch, Fiona",
    "https://dataverse.org/schema/citation/datasetContactEmail": "[email protected]"
  },
  "https://dataverse.org/schema/citation/dsDescription": {
    "https://dataverse.org/schema/citation/dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
  },
  "http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences",
}

@pdurbin
Copy link
Member

pdurbin commented Oct 23, 2024

This is what I've suggested to @JR-1991 who has slides ready about the gnarly complicated native format, to try the semantic API. 😄

See also discussion here:

@JR-1991
Copy link
Contributor

JR-1991 commented Oct 23, 2024

@pdurbin, it is on my bucket list 😁 Can this also be passed to the dataset creation/edit endpoint?

@pdurbin
Copy link
Member

pdurbin commented Oct 23, 2024

@JR-1991 well, you have to pass 'Content-Type: application/ld+json'. Please see the guides: https://guides.dataverse.org/en/6.4/developers/dataset-semantic-metadata-api.html

@kuhlaid
Copy link
Contributor

kuhlaid commented Dec 17, 2024

Please, please, please create a JSON schema for any rework of the metadata and use something like https://rjsf-team.github.io/react-jsonschema-form/ to enforce it on the UI side. Not having a JSON schema for the current JSON metadata used to create a dataset is extremely frustrating.

@kuhlaid
Copy link
Contributor

kuhlaid commented Dec 17, 2024

I guess what I was looking for was the 'dataset-schema.json' file (which is impossible to find using the Sphinx docs search). I'm fairly certain this schema does not sufficiently define the metadata that is allowed to be used. The UI uses very explicit elements such as Author Identifier Type and does not seem to allow for values outside of the defined elements in the UI dropdown list. If that is the case then any 'out of bounds' data should be well defined within the schema. If Author Identifier Type for example is limited to ORCID, ISNI, etc. then those should probably be enumerated within the schema. The current 'dataset-schema.json' file is missing details on explicit elements found in the UI.

@pdurbin
Copy link
Member

pdurbin commented Dec 17, 2024

@kuhlaid yeah, my fear is that what we're offering is not complete enough. As you can see, all those issues and PRs above have been merged. Would you be able to open a fresh issue explaining what would be helpful to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
Status: 🔍 Interest
Development

No branches or pull requests

5 participants