Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we pass / control kwargs ? #354

Open
MSeal opened this issue May 5, 2019 · 7 comments
Open

How should we pass / control kwargs ? #354

MSeal opened this issue May 5, 2019 · 7 comments
Assignees
Labels
help wanted idea An idea which is open to discussion rather than a particular issue or bug

Comments

@MSeal
Copy link
Member

MSeal commented May 5, 2019

In #351 it was raised that the manner in which we pass kwargs hasn't been well defined. We have several places where we pass everything downstream with **kwargs. But we don't control for duplicate names, especially as we're making API improvements.

Having kwarg passing is very useful from plugins we didn't write to be able to control their execution easily, and I'd prefer we keep a mechanism for passing parameters and data to any registered handler methods to support use cases we haven't thought or which are specific to a particular use case.

Off the top of mind I can think of two simple ways to address the name overlap issue.

  • Make a function wrapper to safely map kwargs, prioritizing function def names over kwargs that overlap
  • Always convert kwargs to safe_kwargs dict we pass without the ** splat pattern

One of us will need to implement the solution we land on, likely before the next release. I assigned maintainers involved in the original discussion for now.

@MSeal MSeal added help wanted idea An idea which is open to discussion rather than a particular issue or bug labels May 5, 2019
@betatim
Copy link
Member

betatim commented May 5, 2019

Some things I think don't need to be solved by papermill are situations like:

def foo(a, **kwargs):
    pass

foo(1, a=3)

If you end up in a place where this is a problem that needs solving I think it is a sign that we need to backup and take a different direction somewhere earlier. If this happens the solution is documentation (I think).

I think for passing additional (keyword) arguments to engines we should change execute_notebook to:

def execute_notebook(
    input_path,
    output_path,
...
    cwd=None,
    extra_engine_args=None,  # or maybe an empty dict
):

which users can then call as execute_notebook(..., extra_engine_args=dict(foo='baz', whiz='bang')). This makes it clear where to put these extra arguments and that papermill will treat this as an opaque thing that it "blindly" passes on. Along the lines of "use this as an escape hatch, we hope you checked with the engine first that it knows what to do with these. You are voiding your papermill warranty by doing this.".

There is one wrinkle: some arguments need translating (execution_timeout <-> timeout, start_timeout <-> startup_timeout, etc). My hunch is that if a user sets an argument in their extra_engine_args then they want it to "win". At least I'd be super confused if I set something in an "for experts" bit of the API and it doesn't seem to make a difference.

I think this can be solved by documentation that explains "the argument you call timeout is called execution_timeout, so set that instead". As well as raising a ValueError() in execute_managed_notebook if we see that someone passed timeout in their extra_engine_args. (I know we promised to blindly pass those on but I think in this case we'd be doing the user a favour by letting them know how to achieve their goal.)

As an alternative you could pick out that value and use it, something like this (pop is what we want not get):

# you say timeout, we say execution_timeout
execution_timeout = extra_engine_args.pop("timeout", execution_timeout)
preprocessor = PapermillExecutePreprocessor(
            timeout=execution_timeout,
...
            **extra_engine_args
):

and do the translation for the user.

@vizhur
Copy link
Contributor

vizhur commented May 6, 2019

Agree with @betatim that parameters we overwrite or renamed should be documented so it's clear to a user. Though passing args down the stream as a dict might bring other issue when we'll have to merge 2 or more dicts - so back to the original problem.

We can take any approach but it should be consistent across all API. I'd suggest to use **dict(iterable, **kwargs) to pass caller kwargs downstream

def func2(paramA, paramB, **kwargs):
    print('do stuff')

def func(param1, param2, param3, **kwargs):
    func2(**dict(kwargs, paramA=param1+param2, paramB = param3))

Basically always pass down caller args if they are not overwritten by a callee. That would make kwargs usage safe and consistent, can be documented, just few changes and no additional API required

@MSeal
Copy link
Member Author

MSeal commented May 6, 2019

I agree with those suggestions. I think the clear docs on any kwargs that might be popped and reassigned is a reasonable trade-off.

I also like the **dict(kwargs...) pattern proposed and naming the kwargs after their downstream consumer target.

Would it be reasonable to warn if we swallowed a kwargs with a replacement? Or do you think that's be unnecessary work?

@betatim
Copy link
Member

betatim commented May 6, 2019

Though passing args down the stream as a dict might bring other issue when we'll have to merge 2 or more dicts - so back to the original problem.

This problem exists if you use kwargs as well, it is just a dictionary, there is nothing special about it. In the end you will arrive at the downstream function with a dictionary, either called kwargs or extra_engine_args, and will have to decide which value to use, how to resolve the name translation and whether or not to inform the user that they created a conflict.

I think the solution is to check just before calling the downstream function if there is a conflict and then either raise an exception or use the value from the "passed through" dictionary. If both startup_timeout and timeout are specified in the arguments being passed through I think the value from timeout should be used or an exception raised to tell the user that what they are doing is creating a conflict.

I would define execute_notebook(..., **extra_engine_args) instead of using kwargs as the name for the dictionary that collects the additional arguments. It means you can tell from looking at the signature where these extra keyword arguments are headed.

My thinking behind introducing an explicit argument to pass the keyword arguments (execute_notebook(..., extra_engine_args=dict(foo='baz', whiz='bang'))) is that it allows for the signature of execute_notebook to change in the future. If you don't store passthrough arguments in an extra argument each time you introduce a new argument to execute_notebook user's code will potentially break (which would mean rounds of deprecation warnings, deprecation/error, start using the argument). The problem is that what used to be a valid call suddenly does something else and it is hard to warn the user that foobar, which used to be just passed through, has become an argument that is meaningful for execute_notebook.

I don't see a downside to using an explicit argument as all the problems of merging/conflict resolution are the same. It is also what most other Python code does that I know of (for example multiprocessing uses args and kwds as explicitly named arguments for storing the arguments and keyword arguments, similarly scipy's minimize uses args for additional arguments)

@captainsafia
Copy link
Member

Although I like the seamlessness of using kwargs to pass additional args, I think that have an explicit named parameter for passing kwargs is a much better approach from an API design perspective.

It seems like we mostly agree on the following fundamentals of the implementation:

  • Conflicting arguments should be documented, ideally in docstrings and in tutorials
  • Warnings should be thrown when conflicting arguments are passed (for API users who aren't reading the docs)

@vizhur vizhur mentioned this issue May 8, 2019
@willingc willingc self-assigned this May 8, 2019
@willingc
Copy link
Member

willingc commented May 8, 2019

Sorry I'm just catching up on this discussion. I need a bit of time to wrap my head around what we are trying to achieve. It's not clear to me that we are not complicating things with these approaches.

@MSeal
Copy link
Member Author

MSeal commented May 13, 2019

Do we have consensus on changing to extra_engine_arg=dict(...) pattern with the safe_kwargs pattern in the PR as a warning when it is going to be an issue?

I generally like writing **kwargs and keeping-it-simple-stupid with a warning or docs if I know something is a problem. But given @betatim and @captainsafia would prefer the explict named dicts which I'm fine with taking that approach if we're all generally ok with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted idea An idea which is open to discussion rather than a particular issue or bug
Projects
None yet
Development

No branches or pull requests

5 participants