Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip fetch only works when CPM_SOURCE_CACHE is defined #577

Open
ScottBailey opened this issue Jul 27, 2024 · 13 comments
Open

Skip fetch only works when CPM_SOURCE_CACHE is defined #577

ScottBailey opened this issue Jul 27, 2024 · 13 comments

Comments

@ScottBailey
Copy link
Contributor

ScottBailey commented Jul 27, 2024

When the environment variable CPM_SOURCE_CACHE is set, re-running cmake does not cause a fetch and patch.

When unset, fetch and patch both occur.

The problem observed (see #572 for more discussion) is that the unset path fetch does not overwrite the source and so the patch command fails.

#572 suggests a fix that could ignore errors during patching. I think this could be an issue when a user is testing version updates; it would be nice if we just, say, updated package A to latest version and running CMake told us the patches were incompatible/not applied. I don't think -t reports that?

In my testing, I see that when the environment variable is set, we skip the fetch and patch steps. Maybe we can add a cache variable to indicate the initial fetch and patch skips were successful?

Recreation of this is relatively simple, see https://github.com/ScottBailey/cpm-cmake-test-577 for an automated working example.

@lemire

@wolfpld
Copy link

wolfpld commented Sep 27, 2024

A simple solution seems to be to always enable the cache.

--- a/cmake/CPM.cmake
+++ b/cmake/CPM.cmake
@@ -154,7 +154,7 @@ set(CPM_DRY_RUN
 if(DEFINED ENV{CPM_SOURCE_CACHE})
   set(CPM_SOURCE_CACHE_DEFAULT $ENV{CPM_SOURCE_CACHE})
 else()
-  set(CPM_SOURCE_CACHE_DEFAULT OFF)
+  set(CPM_SOURCE_CACHE_DEFAULT ${CMAKE_CURRENT_BINARY_DIR}/.cpm-cache)
 endif()
 
 set(CPM_SOURCE_CACHE

@lemire
Copy link
Contributor

lemire commented Sep 30, 2024

Intriguing.

@julianoes
Copy link

Would be nice to have some sort of fix for this.

One option would be to do a git checkout . before every patch, at least for the case where there CPM_SOURCE_CACHE_DEFAULT is not set.

@ScottBailey
Copy link
Contributor Author

Would be nice to have some sort of fix for this.

One option would be to do a git checkout . before every patch, at least for the case where there CPM_SOURCE_CACHE_DEFAULT is not set.

I'd love to see this fixed myself. I started working on it, but it's a significant time investment for me. I've done the leg work to easily, repeatable recreate the defect and was hoping someone else might lend a hand. I may have time over the holidays to look deeper into this, but I'm not gonna hold my breath.

If you dig into this, I think you'll find significant complexity around this problem. Enough that a quick fix isn't likely to happen. Still, I'd appreciate someone else taking a look and providing feedback.

@lemire
Copy link
Contributor

lemire commented Nov 13, 2024

@ScottBailey What do you think of @wolfpld's solution?

@ScottBailey
Copy link
Contributor Author

@ScottBailey What do you think of @wolfpld's solution?

I feel like this is unexpected behavior? I think it's better for the end user to decide how to work around this problem rather than band-aid the repo.

The actual defect is probably an easy fix as we can trace the path between working and failing and compare them. Someone just needs to have the time to do it.

@lemire
Copy link
Contributor

lemire commented Nov 13, 2024

I feel like this is unexpected behavior?

Here is how I view things.

We have two code paths. One that were the cache location is set. This code path works.

We have another code path where the cache location is not set. This code path is broken.

The fix is to have just one code path. If the cache location is not set, we just set it. This drops the broken code path and solves the problem... further simplifying the logic.

That is, that the code is so complicated is part of the bug. Why is it so difficult to understand what is going on?

@ScottBailey
Copy link
Contributor Author

As a counter, the code path without a cache in fact works except when a patch is added. If we force a cache path we reduce functionality for others. And if we make a band-aid fix we increase technical debt. Like you, I am just a contributor, not a maintainer. But were I a maintainer I don't think I'd want to take a patch that increased technical debt.

Frankly, I've been hoping I'd done enough prep work to make this an easy fix for someone else. And I think it should be an easy fix, just a little time consuming to compare the traces.

@lemire
Copy link
Contributor

lemire commented Nov 13, 2024

(I also have no power here. I am just a user.)

If we force a cache path we reduce functionality for others.

Concretely, what would be lost?

But were I a maintainer I don't think I'd want to take a patch that increased technical debt

How does the patch proposed by @wolfpld increases the technical debt?

My point above is that it reduces the technical debt by reducing the number of code paths.

@ScottBailey
Copy link
Contributor Author

ScottBailey commented Nov 13, 2024

The workaround generates dead code, no? Unless one were to take it out. Personally, I wouldn't mind a default of "${CMAKE_BINARY_DIR}/CPM_cache" if it didn't already exists because I ALWAYS want to cache and a simplification of the code. :-) I imagine we might get that in. But that is also not a small change.

Very pedantically: the band-aid only reduces the number of code paths executed and does so by generating dead code. That's technical debt. The process of properly removing that code path should is, I'd wager, greater than the cost of fixing the code.

With the band-aid, one would lose the ability to NOT cache downloads. I am uncertain of the impact this might have on others. Furthermore, I'm in no position to hazard a guess. Which means I'm not comfortable supporting its removal. Does that make sense?

Edited to fix a typo.

@lemire
Copy link
Contributor

lemire commented Nov 13, 2024

@ScottBailey

(So we are clear, I have no power here and I am not claiming you are wrong.)

I think that the patch we are discussing is equivalent to making the cache be versioned by default.

With the band-aid, one would lose the ability to NOT cache downloads.

Can't one just set CPM_ARGS_NO_CACHE to avoid cache downloads? Wouldn't that be the proper way?

Very pedantically: the band-aid only reduces the number of code paths executed and does so by generating dead code. That's technical debt. The process of properly removing that code path should is, I'd wager, greater than the cost of fixing the code.

Can you point at the dead code that would result? Which lines of code are you thinking about?

What I see in the code is a handful of case where we check whether CPM_SOURCE_CACHE is truthy.

There is a code region from line 773 to 849 where we check several times that CPM_SOURCE_CACHE is truth (in redundant manners):

CPM.cmake/cmake/CPM.cmake

Lines 773 to 849 in 0bc73f4

elseif(CPM_SOURCE_CACHE AND NOT CPM_ARGS_NO_CACHE)
string(TOLOWER ${CPM_ARGS_NAME} lower_case_name)
set(origin_parameters ${CPM_ARGS_UNPARSED_ARGUMENTS})
list(SORT origin_parameters)
if(CPM_ARGS_CUSTOM_CACHE_KEY)
# Application set a custom unique directory name
set(download_directory ${CPM_SOURCE_CACHE}/${lower_case_name}/${CPM_ARGS_CUSTOM_CACHE_KEY})
elseif(CPM_USE_NAMED_CACHE_DIRECTORIES)
string(SHA1 origin_hash "${origin_parameters};NEW_CACHE_STRUCTURE_TAG")
set(download_directory ${CPM_SOURCE_CACHE}/${lower_case_name}/${origin_hash}/${CPM_ARGS_NAME})
else()
string(SHA1 origin_hash "${origin_parameters}")
set(download_directory ${CPM_SOURCE_CACHE}/${lower_case_name}/${origin_hash})
endif()
# Expand `download_directory` relative path. This is important because EXISTS doesn't work for
# relative paths.
get_filename_component(download_directory ${download_directory} ABSOLUTE)
list(APPEND CPM_ARGS_UNPARSED_ARGUMENTS SOURCE_DIR ${download_directory})
if(CPM_SOURCE_CACHE)
file(LOCK ${download_directory}/../cmake.lock)
endif()
if(EXISTS ${download_directory})
if(CPM_SOURCE_CACHE)
file(LOCK ${download_directory}/../cmake.lock RELEASE)
endif()
cpm_store_fetch_properties(
${CPM_ARGS_NAME} "${download_directory}"
"${CPM_FETCHCONTENT_BASE_DIR}/${lower_case_name}-build"
)
cpm_get_fetch_properties("${CPM_ARGS_NAME}")
if(DEFINED CPM_ARGS_GIT_TAG AND NOT (PATCH_COMMAND IN_LIST CPM_ARGS_UNPARSED_ARGUMENTS))
# warn if cache has been changed since checkout
cpm_check_git_working_dir_is_clean(${download_directory} ${CPM_ARGS_GIT_TAG} IS_CLEAN)
if(NOT ${IS_CLEAN})
message(
WARNING "${CPM_INDENT} Cache for ${CPM_ARGS_NAME} (${download_directory}) is dirty"
)
endif()
endif()
cpm_add_subdirectory(
"${CPM_ARGS_NAME}"
"${DOWNLOAD_ONLY}"
"${${CPM_ARGS_NAME}_SOURCE_DIR}/${CPM_ARGS_SOURCE_SUBDIR}"
"${${CPM_ARGS_NAME}_BINARY_DIR}"
"${CPM_ARGS_EXCLUDE_FROM_ALL}"
"${CPM_ARGS_SYSTEM}"
"${CPM_ARGS_OPTIONS}"
)
set(PACKAGE_INFO "${PACKAGE_INFO} at ${download_directory}")
# As the source dir is already cached/populated, we override the call to FetchContent.
set(CPM_SKIP_FETCH TRUE)
cpm_override_fetchcontent(
"${lower_case_name}" SOURCE_DIR "${${CPM_ARGS_NAME}_SOURCE_DIR}/${CPM_ARGS_SOURCE_SUBDIR}"
BINARY_DIR "${${CPM_ARGS_NAME}_BINARY_DIR}"
)
else()
# Enable shallow clone when GIT_TAG is not a commit hash. Our guess may not be accurate, but
# it should guarantee no commit hash get mis-detected.
if(NOT DEFINED CPM_ARGS_GIT_SHALLOW)
cpm_is_git_tag_commit_hash("${CPM_ARGS_GIT_TAG}" IS_HASH)
if(NOT ${IS_HASH})
list(APPEND CPM_ARGS_UNPARSED_ARGUMENTS GIT_SHALLOW TRUE)
endif()
endif()
# remove timestamps so CMake will re-download the dependency
file(REMOVE_RECURSE ${CPM_FETCHCONTENT_BASE_DIR}/${lower_case_name}-subbuild)
set(PACKAGE_INFO "${PACKAGE_INFO} to ${download_directory}")
endif()
endif()

Specifically, they occur at these lines:

elseif(CPM_SOURCE_CACHE AND NOT CPM_ARGS_NO_CACHE)

if(CPM_SOURCE_CACHE)

if(CPM_SOURCE_CACHE)

There is one additional trivial check...

CPM.cmake/cmake/CPM.cmake

Lines 896 to 897 in 0bc73f4

if(CPM_SOURCE_CACHE AND download_directory)
file(LOCK ${download_directory}/../cmake.lock RELEASE)

Importantly, I do not see any block of code that becomes dead when we add the assumption that CPM_SOURCE_CACHE is always truthy.

@ScottBailey
Copy link
Contributor Author

Shoot! Your halfway there, why not just fix the defect?

@ScottBailey
Copy link
Contributor Author

More usefully: when I did the trace between cache set vs cache not set the path was significantly different. At the time I didn't have the bandwidth to really dig in and follow what was going on. I'd have to rerun the trace to have a defensible opinion - I'm really working from memory/gut here. I might have time to go back and look at it during the year end holiday, but probably not before then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants