Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional `excludeMetadataBlocks` parameter #10778

johannes-darms · 2024-08-19T11:42:12Z

What this PR does / why we need it:

Extension of API {id}/versions and {id}/versions/{versionId} with an optional excludeMetadataBlocks parameter,
that specifies whether the metadataBlocks should be listed in the output. It defaults to false, preserving backward
compatibility. (Note that for a dataset with a large number of versions and/or metadataBlocks having the metadata blocks
included can dramatically increase the volume of the output).

We have slow response from api/datasets/%s/versions due to the large response body. Most of the information included (metadatablocks) is not needed as we just want to display a dropdown list with all available versions.

Which issue(s) this PR closes:

Closes Feature Request/Idea: List Versions of a Dataset API reduced response size #10171

Suggestions on how to test this: Call the API once with the flag and once without.

Does this PR introduce a user interface change?: No

Is there a release notes update needed for this change?: Maybe, it is a new optional property.

Preview docs at https://dataverse-guide--10778.org.readthedocs.build/en/10778/api/native-api.html#list-versions-of-a-dataset

…cks from API response.

…sponse # Conflicts: # src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java

coveralls · 2024-08-19T11:56:56Z

coverage: 22.571% (+1.8%) from 20.791%
when pulling b8d0f3d on johannes-darms:feat/10171-versions-smaller-response
into cf174b2 on IQSS:develop.

johannes-darms · 2024-10-11T09:31:53Z

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

GPortas · 2024-10-11T10:23:49Z

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.

It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called excludeFiles.

This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.

pdurbin · 2024-10-15T18:25:47Z

Here are the docs for excludeFiles: https://guides.dataverse.org/en/6.4/api/native-api.html#get-version-of-a-dataset

johannes-darms · 2024-10-16T07:06:53Z

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.

It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called excludeFiles.

This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.

Our problem is only partly caused by the large complex metadata block, the other cause is the amount of versions (we have a dataset where an update is published every day, the file changes but the metadata is the same). So the payload of this API becomes huge and by omitting the metadata blocks we can mitigate the problem while still getting the necessary information about versions without introducing paging.

I'm not a fan of having different endpoints for more or less the same information. It is more code to maintain and more complicated for the user.

This PR is inspired by the excludeFiles feature and the code is quite similar.

pdurbin · 2024-10-16T13:30:35Z

@GPortas at some point we should probably test the SPA against a dataset with lots of versions. We should have datasets like this on the performance cluster.

pdurbin

@johannes-darms overall, looks good. I left some comments. Thanks.

src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java

doc/release-notes/10171-exlude-metadatablocks.md

doc/sphinx-guides/source/api/native-api.rst

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

pdurbin · 2024-12-16T14:33:54Z

@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.

johannes-darms · 2024-12-18T14:06:54Z

@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.

Sorry for the delay. I've merged, adapted the documentation and wrote a simple test. If you need or want more tests I'm happy to write them.

pdurbin

@johannes-darms thanks for adding tests and addressing all my questions! I didn't run the code myself but tests are passing and code and docs look good. Approved!

johannes-darms added 3 commits August 19, 2024 13:30

feat(api/versions): Added a new optional property to hide metadataBlo…

955312d

…cks from API response.

feat(api/versions): Added a new optional property to hide metadataBlo…

408172c

…cks from API response.

Merge branch 'refs/heads/develop' into feat/10171-versions-smaller-re…

29b30c2

…sponse # Conflicts: # src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java

pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Aug 20, 2024

pdurbin added the Type: Feature a feature request label Oct 9, 2024

cmbz added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 22, 2024

pdurbin self-assigned this Nov 25, 2024

pdurbin reviewed Nov 25, 2024

View reviewed changes

pdurbin assigned johannes-darms Nov 25, 2024

cmbz added the FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) label Dec 5, 2024

pdurbin added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Dec 16, 2024

johannes-darms force-pushed the feat/10171-versions-smaller-response branch from b8d0f3d to 29b30c2 Compare December 18, 2024 13:26

johannes-darms added 2 commits December 18, 2024 14:30

adapted documentation according to phil suggestions

8813df9

added a simple unit test

84cac1e

pdurbin removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Dec 18, 2024

pdurbin approved these changes Dec 18, 2024

View reviewed changes

pdurbin unassigned pdurbin and johannes-darms Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional `excludeMetadataBlocks` parameter #10778

Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional `excludeMetadataBlocks` parameter #10778

johannes-darms commented Aug 19, 2024 •

edited by pdurbin

Loading

coveralls commented Aug 19, 2024 •

edited

Loading

johannes-darms commented Oct 11, 2024

GPortas commented Oct 11, 2024

pdurbin commented Oct 15, 2024

johannes-darms commented Oct 16, 2024

pdurbin commented Oct 16, 2024

pdurbin left a comment

pdurbin commented Dec 16, 2024

johannes-darms commented Dec 18, 2024

pdurbin left a comment

Extension of API {id}/versions and {id}/versions/{versionId} with an optional excludeMetadataBlocks parameter #10778

Are you sure you want to change the base?

Extension of API {id}/versions and {id}/versions/{versionId} with an optional excludeMetadataBlocks parameter #10778

Conversation

johannes-darms commented Aug 19, 2024 • edited by pdurbin Loading

coveralls commented Aug 19, 2024 • edited Loading

johannes-darms commented Oct 11, 2024

GPortas commented Oct 11, 2024

pdurbin commented Oct 15, 2024

johannes-darms commented Oct 16, 2024

pdurbin commented Oct 16, 2024

pdurbin left a comment

Choose a reason for hiding this comment

pdurbin commented Dec 16, 2024

johannes-darms commented Dec 18, 2024

pdurbin left a comment

Choose a reason for hiding this comment

Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional `excludeMetadataBlocks` parameter #10778

Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional `excludeMetadataBlocks` parameter #10778

johannes-darms commented Aug 19, 2024 •

edited by pdurbin

Loading

coveralls commented Aug 19, 2024 •

edited

Loading