Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(issue 832): implement CheckedSession, CheckedParameters and CheckedArray #840

Merged
merged 2 commits into from
Aug 6, 2021

Conversation

alixdamman
Copy link
Collaborator

I know the name FrozenSession is misleading because values of array items can be modified but I couldn't find another better name to emphasis that axes of arrays and all non array items are 'frozen'.

I know that in your version of the StaticSession class you choose to let users to add or remove items and to print warnings message when there is a missing or extra item. I know you made that choice partly to not afraid present and potential users to use this new feature of LArray.
But personally, I can't convince myself that users take any care of warnings messages. I really can't.
Letting them to add and remove items "on the fly" is way too dangerous and this partially breaks the main purpose of this new kind of Session, i.e. relying (safely) on 'autocomplete' to write models.

@gdementen
Copy link
Contributor

As I already tried to tell you earlier (sorry if this was not clear), this is not the way to go. You told me "it is just for testing", but you still do exactly the same here. You use values to define classes, instead of using types or "definitions". A FrozenSession (I also dislike that name) should be very similar to Python 3.7 dataclasses (maybe even use them but I am unsure that is possible) . In dataclasses you define the type of fields of your class, and optionally a default value. In what you did, you define a strict value which cannot change, and you cannot have several instances of that class at the same time. This is really a missed opportunity.

@gdementen
Copy link
Contributor

But personally, I can't convince myself that users take any care of warnings messages. I really can't.

Hu? This is quite the opposite. Most of our users are panicked about warnings and consider them to be errors even when they are not.

Letting them to add and remove items "on the fly" is way too dangerous and this partially breaks the main purpose of this new kind of Session, i.e. relying (safely) on 'autocomplete' to write models.

You sound like auto-completion is the end goal of everything, but it is just a nice plus. Speaking of something "way too dangerous" and autocompletion in the same sentence seems so odd to me...

@gdementen
Copy link
Contributor

About the name, here are a few suggestions: StaticSession (this would be telling for C/Java programmers, but I admit our target users are not), CheckedSession, DefinedSession, SessionDefinition


After creating an instance of a user defined 'session', some restrictions will be applied on it:

- **it is not possible to add or remove any parameter or variable,**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the warning was a good compromise: it is annoying enough that you would not do that in a model, but let's you work quickly using a FrozenSession in an interactive console. I could live with that restriction though IF there is a (documented) easy/quick way to convert a FrozenSession to a "normal" session.

larray/core/session.py Outdated Show resolved Hide resolved

- **it is not possible to add or remove any parameter or variable,**
- **all non array variables (axes, groups, ...) cannot be modified,**
- **only values of array variables can be modified, not their axes.**
Copy link
Contributor

@gdementen gdementen Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can also see the value in this, I wonder if you are not going too far in this. Wouldn't it be better to allow some form of "compatible axes"? For example, I think there would be more advantages than disadvantages to allow arrays with axes in a different order than the one defined or even with more axes than defined. The idea would be that what is defined is what the model requires, but you are free to provide it with more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I am unsure about the "more axes" part. Personally, I would certainly like to work like that (declare all required axes by my model and let it work with any dataset which has at least those axes), but then you lose the quick reference we have now were we can quickly lookup the array name and know its exact axes, instead of knowing minimum required axes you can rely on. Knowing exact axes without running the model has some value too. If I was a modeller, generic code would be more important than "precise" axes, but I wonder if our users wouldn't favour precise axes over generic/powerful code.

larray/core/session.py Outdated Show resolved Hide resolved

def apply(self, func, *args, **kwargs):
kind = kwargs.pop('kind', Array)
instance = self.__class__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should either be the code in the normal Session, so that overridding apply() here is not necessary, or not overriden at all (because I am unsure we actually want that behaviour for FrozenSession). I would rather have it return a normal Session to not be limited by the restrictions of the FrozenSession. If we do not do that, for example, my_frozen_session.apply(Array.sum, 'gender') would not work (because the returned axes are different).

# XXX: I wonder if we shouldn't create an AbstractSession instead of defining the _disabled()
# private method below.
# Auto-completion on any instance of a class that inherits from FrozenSession
# should not propose the add(), update(), filter(), transpose() and compact() methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF we keep that "feature" of having those functions be non functional, then I agree with you, they should not be available at all BUT I am unconvinced about limiting so much what users can do (see below for more details)

@gdementen
Copy link
Contributor

To be clear, this is what we should strive for:

class MyDemoSession(StaticSession):
    a = int                               # maybe using : instead of = is possible/desirable, I don't know
    b = ArrayDef("time=2010..2020")

# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession(a=10, b=zeros("time=2010..2020"))

The advantage of this is that you can have several instances of the same session class without having to modify your code, making variants easier. We can even imagine dumping all parameters of each simulation run to a file, then being able to run several variants just by changing the path of the parameters file.

@alixdamman alixdamman changed the title (issue 832): implement FrozenSession (issue 832): implement TypedSession Jan 3, 2020

>>> # ==== MODEL VARIABLES ====
>>> class ModelVariables(TypedSession):
... FIRST_OBS_YEAR = int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried inheriting/using a dataclass so that the variable types can be defined using ":" instead of "="? That would make us more compatible with the broader Python ecosystem.

Copy link
Contributor

@gdementen gdementen Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget this. I have looked into it, but it seems like the typing module is not flexible enough to type multi-dimensional arrays yet (see python/typing#516). We could invent our own syntax using annotations, and that would cause no runtime problem but we would be incompatible with any type checker out there, including mypy and PyCharm, so there is no point. Let's go for types and "array defs" as class variables like above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looked into it. I didn't try to inherit/use a dataclass because I thought that dataclass was limited to Python 3.7 or higher only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataclasses are indeed python 3.7+, but (class) variables annotations (PEP 526) are python3.6+.
dataclasses are based on those (and this is what I actually wanted to try using -- sorry for using the wrong word -- those were synonyms in my head when they should not). Either way, this is not ready yet and I am unsure it will ever be flexible enough to be able to provide custom validation functions in Python (to validate that an array actually has the correct axes).

... m = ModelVariables()
... # ==== setup variables ====
... # set scalars
... m.FIRST_OBS_YEAR = 1991
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't using keyword arguments when instantiating the ModelVariables instance make this a bit cleaner (avoid repeating the "m."? i.e.:

m = ModelVariables(
    FIRST_OBS_YEAR=1991,
    FIRST_PROJ_YEAR=...,
)

... BUT it would prevent you from using values defined earlier (such as using axes in the arrays definitions). So I guess this is why you use this syntax. Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

But there is something else for what I would like some advices. The main drawbacks of allowing to refer to axes defined at runtime in var = ArrayDef((...)) are:

  1. they must be referred using strings -> we loose the benefit auto-completion + make it dangerous to rename these axes (via refactoring (Shift + F6) in PyCharm)
  2. the following code:
m = ModelVariables()
....
# save to Excel file -> save Array objects only (not Axis objects)
m.save('variant_0.xlsx')
...
# load variables for variant 0 from Excel file -> FAIL
m = ModelVariables('variant_0.xlsx')

will fail because Axis objects are not saved in Excel or CSV files.

what's your opinion on this?

Copy link
Contributor

@gdementen gdementen Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the second point is not a problem. If they want to do this, they should not use an Excel file! Now, it's up to us to make this clear in the documentation if it is not already (I think there is already a mention about this). In the code, nothing needs to change I think as there is already a warning that the axis object is not saved, right?

larray/core/session.py Outdated Show resolved Hide resolved
larray/core/session.py Outdated Show resolved Hide resolved
... G_ADULTS = Group
... G_OBS_YEARS = Group
... G_PROJ_YEARS = Group
... population = ArrayDef(('AGE', 'GENDER', 'TIME'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some vague aversion about mixing arrays and parameters in the same Session. I cannot word it yet. Maybe it is just because that's not how you work in "demo", but I think there is something else behind my gut feeling. Technically I don't think we need anything different but I wonder what is better to show as an example: something simple but not optimal, or the "better" (but more complex) organisation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you consider as parameters? Everything that doesn't change at runtime or read-only scalars?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, this innocent-looking question is actually a very good question, because it would help to have a clear idea and precise definitions of this both for our examples/documentation and our users models. I think I would define a constant as something which never ever changes (across all variants of a model) and a parameter as anything which can change from one variant or simulation to the next but is constant over the simulation. This might be a very personal take on the issue, I don't know. Unsure where to draw the line between input data and parameter though.

@alixdamman alixdamman changed the title (issue 832): implement TypedSession (issue 832): implement ConstrainedSession Jan 8, 2020
@alixdamman
Copy link
Collaborator Author

@gdementen I have a problem with Python 2.7 and the HDF5 format (again):

ValueError: unsupported pickle protocol: 4

../../../miniconda/envs/travisci/lib/python2.7/site-packages/tables/atom.py:1227: ValueError

I think you solved the same kind of problem recently. Do you remember how to do that?

@gdementen
Copy link
Contributor

gdementen commented Feb 19, 2020

@alixdamman I don't remember having this issue (but my memory is so bad I am unsure), I guess I had a different issue. Here the issue is because you are trying to read on Python2 an hdf file created using python 3 (and thus pickle protocol 4). There is no clean fix for this because it is a limitation (AFAICT) of PyTables which always use pickle latest protocol (see PyTables/PyTables#472 and https://stackoverflow.com/questions/60067953/is-it-possible-to-specify-the-pickle-protocol-when-writing-pandas-to-hdf5). I don't know for sure why it worked before. I suppose it is something like you only used basic data types before (which have native hdf support and are thus not pickled) but now you are trying to dump more "complex" data types like (unicode?) strings.

I see several workarounds: only dump numeric stuff in the examples (maybe explicit bytes string would work too?), or use different file names for Python2 and Python3 (so that python2 reads the python2 file and Python3 the python3 file), or use pandas/pytables "table" format (which only supports non-pickle types -- see pandas-dev/pandas#4260).

But maybe it is just time to drop Python2. FWIW, I have already started not expecting any more change in larray for LIAM2 (I am monkey-patching larray as I need), so I guess it would not get any more painful than it already is... I still need to do 0.32.2 though (because no support for Pandas 1.0 is bad) but that's not as easy as I would have liked because the release scripts do not work on a branch anymore.

@gdementen gdementen added this to the 0.33 milestone Feb 19, 2020
@alixdamman alixdamman mentioned this pull request Feb 20, 2020
@alixdamman
Copy link
Collaborator Author

Looking at the examples I have written for the ConstrainedSession class, I fear that users will find cumbersome to first have to define the type and as a second step to give an initial value.

Couldn't we allow users to declare either the type or a "default initializing" value ?

Something like:

class MyDemoSession(ConstrainedSession):
    a = 10
    b = ArrayDef("time=2010..2020")

# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession(b=zeros("time=2010..2020"))  # where an instance attribute 'a' will be added with value 10

@gdementen
Copy link
Contributor

gdementen commented Mar 6, 2020

Hmmm. Having values directly in the class definition is completely fine to me but I had the idea that those would be constant values (over all possible variants) instead of being default values. Otherwise, it seems to me the boundary of what you can modify or not is fuzzy: you can modify something defined in the class if it is a scalar, but not if it is an array? And the name ConstrainedSession suddenly seems a bit strange.

Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class MyDemoSession(ConstrainedSession):
    a = 10
    b = ArrayDef("time=2010..2020")

# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession(b=zeros("time=2010..2020"))  # where an instance attribute 'a' will be added with value 10

This makes me think that we could also implement something like:

d = MyDemoSession.zeros()
# or maybe even (though that might be not explicit enough)
d = MyDemoSession()

which would initialize all arrays to zeros().

larray/core/session.py Outdated Show resolved Hide resolved
@alixdamman
Copy link
Collaborator Author

Hmmm. Having values directly in the class definition is completely fine to me but I had the idea that those would be constant values (over all possible variants) instead of being default values.

OK, so for each non-array declared variable, users have the choice between specifying a constant value (over all variant) or its type. Right?

@alixdamman
Copy link
Collaborator Author

This makes me think that we could also implement something like:

d = MyDemoSession.zeros()
# or maybe even (though that might be not explicit enough)
d = MyDemoSession()

which would initialize all arrays to zeros().

Or we could add an optional argument to ArrayDef:

class MyDemoSession(ConstrainedSession):
    a = 10
    b = ArrayDef("time=2010..2020", fill_value=0)

# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession()

?

@gdementen
Copy link
Contributor

Or we could add an optional argument to ArrayDef:

class MyDemoSession(ConstrainedSession):
    a = 10
    b = ArrayDef("time=2010..2020", fill_value=0)

This is probably better than my "d = MyDemoSession.zeros()" suggestion because it is more granular, but I am still unsure about this because then distinction between a constraint and a normal initialized array becomes very thin.

In any case, I would only implement this in a second stage, if it actually helps a practical use case.

@alixdamman
Copy link
Collaborator Author

I need some help for the implementation of the ConstrainedSession.__init__ method (see last commit).
I'm not sure what to do.

f"in the {self.__class__.__name__} class definition.")

def save(self, fname, names=None, engine='auto', overwrite=True, display=False, **kwargs):
for key in set(self._cls_attrs.keys()) - set(self.keys()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be such checks and warnings during save! Those should have happened earlier (in __setitem__ and __setattr__ -- which is already the case AFAICT)!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the first check is probably a good idea. It catches "NOT_LOADED" values, right?
However I still stand that the second check seems redundant with a warning you should have received earlier (when "adding" that variable to the session).

@@ -1590,7 +1603,9 @@ def _check_key_value(self, key, value):
attr_def = getattr(cls, key, None)
if attr_def is None:
warnings.warn("'{}' is not declared in '{}'".format(key, self.__class__.__name__), stacklevel=2)
else:
elif isinstance(attr_def, (type, ArrayDef)):
if value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this special case for None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not loose the order of the variables declared in the user defined class, I need to find a way to add the non-constant variables with a "temporary" value when calling the __init__ method. For the moment, I add the non-constant variables with the value set to None in the __init__ method until the user set a proper value later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use a more explicit value for this than None because None could be a valid value. The usual trick is to create a global variable with an empty object as a unique flag:

NOT_LOADED = object()

Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during __init__.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual trick is to create a global variable with an empty object as a unique flag:

NOT_LOADED = object()

Thanks for the trick.

Copy link
Collaborator Author

@alixdamman alixdamman Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during init.

Since _check_key_value() is used in __setitem__ and __setattr__ methods and those methods do not accept extra arguments, I wonder if it is not better to add a flag as a private instance attribute to activate or deactivate the raising of an exception when a user try to set a value to a variable declared as constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether that's a good idea or not in this case, but you can define those methods with extra arguments (and use those extra arguments if you call those methods explicitly). It's just that the normal syntax/implicit way to call them will not use those extra arguments. See for example LArray.__getitem__

else:
return v1 == v2
if not equal(value, attr_def):
warnings.warn(f"Cannot modify the value of the variable '{key}' declared as a constant in the "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to either change the message (which sounds like the action was blocked) or raise an exception instead of using a warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first tried to raise an exception but it breaks the __init__ and the load methods.
I think raising an exception is the proper way to do it but I am not sure how to bypass this check when calling the __init__ or load method?

larray/tests/test_session.py Outdated Show resolved Hide resolved
@gdementen
Copy link
Contributor

I need some help for the implementation of the ConstrainedSession.__init__ method (see last commit).
I'm not sure what to do.

The question is about the ordering of keys, right? Which ordering is lost? The one from the class or the one from the instance (when initializing from kwargs, a dict or another session)? You are right that we cannot keep both. I think we should try to maintain the class ordering and I think it should be possible to do that (but I haven't thought that much about this).

def load(self, fname, names=None, engine='auto', display=False, **kwargs):
super().load(fname, names, engine, display, **kwargs)
for key in set(self._cls_attrs.keys()) - set(self.keys()):
warnings.warn(f"The variable {key} is declared in the {self.__class__.__name__} "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure this warning should exist. Creating a session out of several files seems like something we should allow without a warning. Maybe the warning should be there only if names is None???

If key was in names and it's not in the file, this should be an error (not a warning) but I suppose super().load would already catch that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not innocent though. If we allow this, we must handle the case where we request an attribute which has been declared but not loaded (yet). I am leaning towards that being a good idea (because we will need something like that for LazySession anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, to keep things simple, we might want to start without that feature and implement it later.

f"class definition but has not found in file {fname}.")
for key in set(self.keys()) - set(self._cls_attrs.keys()):
warnings.warn(f"The variable {key} has been found in file {fname} but is not declared "
f"in the {self.__class__.__name__} class definition.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure this warning should exist either. It seems reasonable to me to instantiate a session with a file containing more variables than necessary. LIAM2 accepts this kind of things (if I transpose the concepts) and users have found this feature quite useful. The trick is that, in that case LIAM2 only loads the declared stuff. Maybe we should do this here too?

Copy link
Collaborator Author

@alixdamman alixdamman Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trick is that, in that case LIAM2 only loads the declared stuff.

So LIAM2 only loads the declared stuff but does not print any warnings if there are undeclared stuff in the file?

Copy link
Contributor

@gdementen gdementen Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The spirit is that you declare the fields (LIAM2 vocabulary) you need in your model and can use whatever file provides those fields. This makes it possible to use the same input data files for several variants, some of which need extra fields.

@@ -1590,7 +1603,9 @@ def _check_key_value(self, key, value):
attr_def = getattr(cls, key, None)
if attr_def is None:
warnings.warn("'{}' is not declared in '{}'".format(key, self.__class__.__name__), stacklevel=2)
else:
elif isinstance(attr_def, (type, ArrayDef)):
if value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use a more explicit value for this than None because None could be a valid value. The usual trick is to create a global variable with an empty object as a unique flag:

NOT_LOADED = object()

Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during __init__.

f"in the {self.__class__.__name__} class definition.")

def save(self, fname, names=None, engine='auto', overwrite=True, display=False, **kwargs):
for key in set(self._cls_attrs.keys()) - set(self.keys()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the first check is probably a good idea. It catches "NOT_LOADED" values, right?
However I still stand that the second check seems redundant with a warning you should have received earlier (when "adding" that variable to the session).

... population_obs = Array
... population_proj = Array
... # declare arrays with fixed axes
... mortality_rate = ArrayDef((AGE, GENDER))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, even if I came up with it, I am not entirely satisfied with the name ArrayDef. I would like to find a name which reads more naturally. Something like ArrayWithSpecificAxes but preferably shorter 😉. Do you have any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it that it would be useful to provide a way for users to declare that an array must have some specific axes even if the labels of those axes are not specified yet. In the example above, this would mean the ability to write something like either:

population_obs = ArrayDef((AGE, GENDER, "year"))

or better yet (if possible -- I think it would be tricky but possible):

population_obs = ArrayDef((AGE, GENDER, obs_years))

the later might require using AxisDef instances instead of just "Axis" to be able to differentiate instances: eg:

obs_years = AxisDef("time")
population_obs = ArrayDef((AGE, GENDER, obs_years))

The goal is keep the class definition as close as possible to a complete reference of what each variable is (and also be more precise in the safety checks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder how PyCharm behaves when using "show definition" on such an attribute. Does it show the class definition or the point where it was given a value "m.obs_years = Axis(...)"? I am guessing the later, when it is defined but I don't really know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it that it would be useful to provide a way for users to declare that an array must have some specific axes even if the labels of those axes are not specified yet.

You're right but can I move this suggestion to a separate issue and implement it later? This PR is complicated enough.

larray/core/session.py Outdated Show resolved Hide resolved
@gdementen
Copy link
Contributor

gdementen commented Mar 18, 2020

One more comment. We must make it easy for the user to somehow create a ConstrainedSession from arbitrary files (in any strange format they like). Obviously in that case they need to provide a function to load those files. There are two ways to approach this:

  • either ask the user to create a function which creates/loads arrays and axes and returns an instance of their specific session class. I don't imagine this would cause any problem.
class MyData(ConstrainedSession):
    a = AxisDef()
    arr1 = ArrayDef(a)
def load_stuff(filepath):
    with open_excel(filepath) as wb:
        arr1 = wb[...][...].load()
    return MyData(a=..., arr1=arr1)
  • allow users to override their session class's __init__ method. I think it would be cleaner, but depending on how ConstrainedSession is implemented, this could be harder to get working.
class MyData(ConstrainedSession):
    a = AxisDef()
    arr1 = ArrayDef(a)
    def __init__(self, filepath):
        with open_excel(filepath) as wb:
            arr1 = wb[...][...].load()
        # or is there a better way?
        super().__init__(a=..., arr1=arr1)

I am unsure anything needs to change in the code, but I think testing this is important (to at least know where we stand) and providing an example of whatever is the "best" way in the tutorial would be a good idea I think.

Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice to me overall. Two things worry me though: the pydantic version and the systematic copy on each setattr (AFAIU).

.travis.yml Outdated
@@ -35,8 +35,10 @@ install:
# except scipy and install pandas with --no-deps
# - pandas >= 0.20.0 is required since commit 01669f2024a7bffe47cceec0a0fd845f71b6f7cc
# (issue 702 : fixed bug when writing metadata using HDF format)
# - pydantic: cannot define numpy ndarray / pandas obj / LArray field with default value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we blocked on an older version forever or is it a temporary problem?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say. I opened an issue on the pydantic github repository.
pydantic is still currently in active development but I don't how they manage their issues.

README.rst Outdated Show resolved Hide resolved
larray/core/constrained.py Outdated Show resolved Hide resolved

# broadcast if needed
array = empty(axes=cls.expected_axes, dtype=cls.dtype)
array[:] = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra copy is both a good thing (some people find this more intuitive) and a bad thing (a tad less performant but more importantly could lead to unexpected errors given that the behavior is different than for "normal" sessions which only hold references to arrays). Unsure how to solve this (or if it needs solving at all) but this particular point will at least need to be documented explicitly (we possibily need to clarify the situation for normal Session too).

Also, when is validate called? In the code like below, would it make an extra copy for each line?

s.arr = zeros(...)
s.arr = s.arr + 1

If this is the case, can't we imagine a system where if the attribute was never initialized, it uses the current code path using empty, but if it was, only the check is done but not the copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
I'll make several tests to see if I can come with something better than this extra copy.

Copy link
Collaborator Author

@alixdamman alixdamman Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a new version of ConstrainedArrayImpl.validate(). I still need to create an extra copy in case of missing axes but maybe there is another way in case of missing axes to avoid to create an extra copy that I don't see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no way to add axes without copying: you cannot broadcast "inplace". FWIW, there is one exception to this, it is if one accepts the resulting array to be readonly (which .expand() can do as an option) but we clearly cannot use that option here.

larray/core/constrained.py Outdated Show resolved Hide resolved
larray/core/constrained.py Outdated Show resolved Hide resolved
larray/core/session.py Outdated Show resolved Hide resolved
larray/core/session.py Outdated Show resolved Hide resolved
cs = TestConstrainedSession()
assert len(caught_warnings) == len(not_loaded_variables)
for i, var_name in enumerate(not_loaded_variables):
assert caught_warnings[i].message.args[0] == f"No value passed for the declared variable '{var_name}'"
Copy link
Contributor

@gdementen gdementen Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with any solution as long as we give ourselve the opportunity to revisit this choice later. Makes me think that we might want to label the new feature "experimental" for one or two releases.

@alixdamman
Copy link
Collaborator Author

Do you know why Travis tests are no longer launched automatically in PRs ?

@gdementen
Copy link
Contributor

Do you know why Travis tests are no longer launched automatically in PRs ?

No... No idea, sorry. 😞

@gdementen
Copy link
Contributor

Do you know why Travis tests are no longer launched automatically in PRs ?

Probably due to: https://blog.travis-ci.com/2021-05-07-orgshutdown

@gdementen
Copy link
Contributor

I migrated our CI to travis-ci.com (instead of .org). The problem is we got a one-time 10000 credits (not sure what those correspond to) and they will probably run out quickly.

@alixdamman
Copy link
Collaborator Author

closed & reopen to trigger a Travis build

@alixdamman alixdamman closed this Jun 30, 2021
@alixdamman alixdamman reopened this Jun 30, 2021
@alixdamman
Copy link
Collaborator Author

Travis is installing a very old version of pydantic (0.18.2). I need to check why.

@alixdamman
Copy link
Collaborator Author

As a final discussion, I wonder if using a pejorative word such as constrained is a good idea considering our users. Even it is not orthodox, I would consider to hide the "constrained" by simply prefixing Array and Session with a C, i.e. CArray instead of ConstrainedArray and CSession instead of ConstrainedSession.

I guess we need the opinion of someone else from our team this time. What do you think ?

Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final discussion, I wonder if using a pejorative word such as constrained is a good idea considering our users. Even it is not orthodox, I would consider to hide the "constrained" by simply prefixing Array and Session with a C, i.e. CArray instead of ConstrainedArray and CSession instead of ConstrainedSession.

I guess we need the opinion of someone else from our team this time. What do you think ?

I don't think it's pejorative. It just says what it does... but we can try to find another name if you do not like it. I am not a fan of C*. What about Fixed* or Checked*? Asking other users about this is fine with me too.

larray/core/constrained.py Outdated Show resolved Hide resolved
larray/core/constrained.py Outdated Show resolved Hide resolved
larray/core/constrained.py Outdated Show resolved Hide resolved
@alixdamman
Copy link
Collaborator Author

What about Fixed* or Checked*?

Checked* sounds the best choice to me.

@alixdamman
Copy link
Collaborator Author

One last review and merge as it is ?
This feature is really urgent for at least 2 of our in-house projects (demo + horblk) and must be included in the next release.

Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Congratulations for the huge work! Please address as many of my comments as you feel are necessary and merge without asking further review.

larray/core/checked.py Show resolved Hide resolved
larray/core/checked.py Outdated Show resolved Hide resolved
larray/core/checked.py Outdated Show resolved Hide resolved
larray/core/checked.py Outdated Show resolved Hide resolved
larray/core/checked.py Outdated Show resolved Hide resolved
cs['h'] = zeros_like(h)

# trying to add an undeclared variable -> prints a warning message
with pytest.warns(UserWarning) as caught_warnings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler syntax or must_warn

# a) type given explicitly
# -> Axis
expected_error_msg = "instance of Axis expected"
with pytest.raises(TypeError) as error:
Copy link
Contributor

@gdementen gdementen Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the match argument of pytest.raises to check the error message directly, for this test and all (or at least most) subsequent calls to pytest.raises below. Don't forget to re.escape() the message (or provide a must_raise helper which does it, similar to must_warn)


names = session.filter(kind=Array).names
names = Session({k: v for k, v in session.items() if isinstance(v, Array)}).names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you use filter anymore?

setup.cfg Outdated Show resolved Hide resolved
doc/source/api.rst Outdated Show resolved Hide resolved
@gdementen
Copy link
Contributor

given the test failures, I guess you need to introduce a "needs_pydantic" decorator

@alixdamman alixdamman force-pushed the 832_static_session branch 2 times, most recently from 4c094ee to 48ca515 Compare August 4, 2021 09:24
@alixdamman
Copy link
Collaborator Author

you could try (if not done already) to install pydantic via pip

It worked. Thanks.

@alixdamman alixdamman changed the title (issue 832): implement ConstrainedSession (issue 832): implement CheckedSession, CheckedParameters and CheckedArray Aug 4, 2021
@alixdamman
Copy link
Collaborator Author

@gdementen OK with the changelog ? (last commit)

@gdementen
Copy link
Contributor

@alixdamman: sure. I personally wouldn't explain what autocompletion is but... 🤷‍♂️

@alixdamman alixdamman force-pushed the 832_static_session branch 2 times, most recently from 41bb5f0 to 0324475 Compare August 6, 2021 09:19
…s and CheckedArray classes (based on preliminary code written by gdementen)
@alixdamman alixdamman merged commit 873717d into larray-project:master Aug 6, 2021
@alixdamman alixdamman deleted the 832_static_session branch August 6, 2021 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants