-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(issue 832): implement CheckedSession, CheckedParameters and CheckedArray #840
(issue 832): implement CheckedSession, CheckedParameters and CheckedArray #840
Conversation
As I already tried to tell you earlier (sorry if this was not clear), this is not the way to go. You told me "it is just for testing", but you still do exactly the same here. You use values to define classes, instead of using types or "definitions". A FrozenSession (I also dislike that name) should be very similar to Python 3.7 dataclasses (maybe even use them but I am unsure that is possible) . In dataclasses you define the type of fields of your class, and optionally a default value. In what you did, you define a strict value which cannot change, and you cannot have several instances of that class at the same time. This is really a missed opportunity. |
Hu? This is quite the opposite. Most of our users are panicked about warnings and consider them to be errors even when they are not.
You sound like auto-completion is the end goal of everything, but it is just a nice plus. Speaking of something "way too dangerous" and autocompletion in the same sentence seems so odd to me... |
About the name, here are a few suggestions: StaticSession (this would be telling for C/Java programmers, but I admit our target users are not), CheckedSession, DefinedSession, SessionDefinition |
larray/core/session.py
Outdated
|
||
After creating an instance of a user defined 'session', some restrictions will be applied on it: | ||
|
||
- **it is not possible to add or remove any parameter or variable,** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the warning was a good compromise: it is annoying enough that you would not do that in a model, but let's you work quickly using a FrozenSession in an interactive console. I could live with that restriction though IF there is a (documented) easy/quick way to convert a FrozenSession to a "normal" session.
larray/core/session.py
Outdated
|
||
- **it is not possible to add or remove any parameter or variable,** | ||
- **all non array variables (axes, groups, ...) cannot be modified,** | ||
- **only values of array variables can be modified, not their axes.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I can also see the value in this, I wonder if you are not going too far in this. Wouldn't it be better to allow some form of "compatible axes"? For example, I think there would be more advantages than disadvantages to allow arrays with axes in a different order than the one defined or even with more axes than defined. The idea would be that what is defined is what the model requires, but you are free to provide it with more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am unsure about the "more axes" part. Personally, I would certainly like to work like that (declare all required axes by my model and let it work with any dataset which has at least those axes), but then you lose the quick reference we have now were we can quickly lookup the array name and know its exact axes, instead of knowing minimum required axes you can rely on. Knowing exact axes without running the model has some value too. If I was a modeller, generic code would be more important than "precise" axes, but I wonder if our users wouldn't favour precise axes over generic/powerful code.
larray/core/session.py
Outdated
|
||
def apply(self, func, *args, **kwargs): | ||
kind = kwargs.pop('kind', Array) | ||
instance = self.__class__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either be the code in the normal Session, so that overridding
apply() here is not necessary, or not overriden at all (because I am unsure we actually want that behaviour for FrozenSession). I would rather have it return a normal Session to not be limited by the restrictions of the FrozenSession. If we do not do that, for example, my_frozen_session.apply(Array.sum, 'gender') would not work (because the returned axes are different).
larray/core/session.py
Outdated
# XXX: I wonder if we shouldn't create an AbstractSession instead of defining the _disabled() | ||
# private method below. | ||
# Auto-completion on any instance of a class that inherits from FrozenSession | ||
# should not propose the add(), update(), filter(), transpose() and compact() methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IF we keep that "feature" of having those functions be non functional, then I agree with you, they should not be available at all BUT I am unconvinced about limiting so much what users can do (see below for more details)
To be clear, this is what we should strive for: class MyDemoSession(StaticSession):
a = int # maybe using : instead of = is possible/desirable, I don't know
b = ArrayDef("time=2010..2020")
# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession(a=10, b=zeros("time=2010..2020")) The advantage of this is that you can have several instances of the same session class without having to modify your code, making variants easier. We can even imagine dumping all parameters of each simulation run to a file, then being able to run several variants just by changing the path of the parameters file. |
larray/core/session.py
Outdated
|
||
>>> # ==== MODEL VARIABLES ==== | ||
>>> class ModelVariables(TypedSession): | ||
... FIRST_OBS_YEAR = int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried inheriting/using a dataclass so that the variable types can be defined using ":" instead of "="? That would make us more compatible with the broader Python ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget this. I have looked into it, but it seems like the typing module is not flexible enough to type multi-dimensional arrays yet (see python/typing#516). We could invent our own syntax using annotations, and that would cause no runtime problem but we would be incompatible with any type checker out there, including mypy and PyCharm, so there is no point. Let's go for types and "array defs" as class variables like above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looked into it. I didn't try to inherit/use a dataclass because I thought that dataclass was limited to Python 3.7 or higher only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataclasses are indeed python 3.7+, but (class) variables annotations (PEP 526) are python3.6+.
dataclasses are based on those (and this is what I actually wanted to try using -- sorry for using the wrong word -- those were synonyms in my head when they should not). Either way, this is not ready yet and I am unsure it will ever be flexible enough to be able to provide custom validation functions in Python (to validate that an array actually has the correct axes).
larray/core/session.py
Outdated
... m = ModelVariables() | ||
... # ==== setup variables ==== | ||
... # set scalars | ||
... m.FIRST_OBS_YEAR = 1991 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't using keyword arguments when instantiating the ModelVariables instance make this a bit cleaner (avoid repeating the "m."? i.e.:
m = ModelVariables(
FIRST_OBS_YEAR=1991,
FIRST_PROJ_YEAR=...,
)
... BUT it would prevent you from using values defined earlier (such as using axes in the arrays definitions). So I guess this is why you use this syntax. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
But there is something else for what I would like some advices. The main drawbacks of allowing to refer to axes defined at runtime in var = ArrayDef((...))
are:
- they must be referred using strings -> we loose the benefit auto-completion + make it dangerous to rename these axes (via refactoring (Shift + F6) in PyCharm)
- the following code:
m = ModelVariables()
....
# save to Excel file -> save Array objects only (not Axis objects)
m.save('variant_0.xlsx')
...
# load variables for variant 0 from Excel file -> FAIL
m = ModelVariables('variant_0.xlsx')
will fail because Axis objects are not saved in Excel or CSV files.
what's your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the second point is not a problem. If they want to do this, they should not use an Excel file! Now, it's up to us to make this clear in the documentation if it is not already (I think there is already a mention about this). In the code, nothing needs to change I think as there is already a warning that the axis object is not saved, right?
larray/core/session.py
Outdated
... G_ADULTS = Group | ||
... G_OBS_YEARS = Group | ||
... G_PROJ_YEARS = Group | ||
... population = ArrayDef(('AGE', 'GENDER', 'TIME')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some vague aversion about mixing arrays and parameters in the same Session. I cannot word it yet. Maybe it is just because that's not how you work in "demo", but I think there is something else behind my gut feeling. Technically I don't think we need anything different but I wonder what is better to show as an example: something simple but not optimal, or the "better" (but more complex) organisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you consider as parameters? Everything that doesn't change at runtime or read-only scalars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, this innocent-looking question is actually a very good question, because it would help to have a clear idea and precise definitions of this both for our examples/documentation and our users models. I think I would define a constant as something which never ever changes (across all variants of a model) and a parameter as anything which can change from one variant or simulation to the next but is constant over the simulation. This might be a very personal take on the issue, I don't know. Unsure where to draw the line between input data and parameter though.
60deff0
to
f65fae2
Compare
@gdementen I have a problem with Python 2.7 and the HDF5 format (again):
I think you solved the same kind of problem recently. Do you remember how to do that? |
@alixdamman I don't remember having this issue (but my memory is so bad I am unsure), I guess I had a different issue. Here the issue is because you are trying to read on Python2 an hdf file created using python 3 (and thus pickle protocol 4). There is no clean fix for this because it is a limitation (AFAICT) of PyTables which always use pickle latest protocol (see PyTables/PyTables#472 and https://stackoverflow.com/questions/60067953/is-it-possible-to-specify-the-pickle-protocol-when-writing-pandas-to-hdf5). I don't know for sure why it worked before. I suppose it is something like you only used basic data types before (which have native hdf support and are thus not pickled) but now you are trying to dump more "complex" data types like (unicode?) strings. I see several workarounds: only dump numeric stuff in the examples (maybe explicit bytes string would work too?), or use different file names for Python2 and Python3 (so that python2 reads the python2 file and Python3 the python3 file), or use pandas/pytables "table" format (which only supports non-pickle types -- see pandas-dev/pandas#4260). But maybe it is just time to drop Python2. FWIW, I have already started not expecting any more change in larray for LIAM2 (I am monkey-patching larray as I need), so I guess it would not get any more painful than it already is... I still need to do 0.32.2 though (because no support for Pandas 1.0 is bad) but that's not as easy as I would have liked because the release scripts do not work on a branch anymore. |
Looking at the examples I have written for the ConstrainedSession class, I fear that users will find cumbersome to first have to define the type and as a second step to give an initial value. Couldn't we allow users to declare either the type or a "default initializing" value ? Something like: class MyDemoSession(ConstrainedSession):
a = 10
b = ArrayDef("time=2010..2020")
# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession(b=zeros("time=2010..2020")) # where an instance attribute 'a' will be added with value 10 |
Hmmm. Having values directly in the class definition is completely fine to me but I had the idea that those would be constant values (over all possible variants) instead of being default values. Otherwise, it seems to me the boundary of what you can modify or not is fuzzy: you can modify something defined in the class if it is a scalar, but not if it is an array? And the name ConstrainedSession suddenly seems a bit strange. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class MyDemoSession(ConstrainedSession): a = 10 b = ArrayDef("time=2010..2020") # and then d = MyDemoSession('path.h5') # or d = MyDemoSession(b=zeros("time=2010..2020")) # where an instance attribute 'a' will be added with value 10
This makes me think that we could also implement something like:
d = MyDemoSession.zeros()
# or maybe even (though that might be not explicit enough)
d = MyDemoSession()
which would initialize all arrays to zeros().
OK, so for each non-array declared variable, users have the choice between specifying a constant value (over all variant) or its type. Right? |
d = MyDemoSession.zeros()
# or maybe even (though that might be not explicit enough)
d = MyDemoSession()
Or we could add an optional argument to ArrayDef: class MyDemoSession(ConstrainedSession):
a = 10
b = ArrayDef("time=2010..2020", fill_value=0)
# and then
d = MyDemoSession('path.h5')
# or
d = MyDemoSession() ? |
This is probably better than my "d = MyDemoSession.zeros()" suggestion because it is more granular, but I am still unsure about this because then distinction between a constraint and a normal initialized array becomes very thin. In any case, I would only implement this in a second stage, if it actually helps a practical use case. |
I need some help for the implementation of the |
larray/core/session.py
Outdated
f"in the {self.__class__.__name__} class definition.") | ||
|
||
def save(self, fname, names=None, engine='auto', overwrite=True, display=False, **kwargs): | ||
for key in set(self._cls_attrs.keys()) - set(self.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be such checks and warnings during save! Those should have happened earlier (in __setitem__
and __setattr__
-- which is already the case AFAICT)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, the first check is probably a good idea. It catches "NOT_LOADED" values, right?
However I still stand that the second check seems redundant with a warning you should have received earlier (when "adding" that variable to the session).
larray/core/session.py
Outdated
@@ -1590,7 +1603,9 @@ def _check_key_value(self, key, value): | |||
attr_def = getattr(cls, key, None) | |||
if attr_def is None: | |||
warnings.warn("'{}' is not declared in '{}'".format(key, self.__class__.__name__), stacklevel=2) | |||
else: | |||
elif isinstance(attr_def, (type, ArrayDef)): | |||
if value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this special case for None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not loose the order of the variables declared in the user defined class, I need to find a way to add the non-constant variables with a "temporary" value when calling the __init__
method. For the moment, I add the non-constant variables with the value set to None in the __init__
method until the user set a proper value later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use a more explicit value for this than None because None could be a valid value. The usual trick is to create a global variable with an empty object as a unique flag:
NOT_LOADED = object()
Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual trick is to create a global variable with an empty object as a unique flag:
NOT_LOADED = object()
Thanks for the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during init.
Since _check_key_value()
is used in __setitem__
and __setattr__
methods and those methods do not accept extra arguments, I wonder if it is not better to add a flag as a private instance attribute to activate or deactivate the raising of an exception when a user try to set a value to a variable declared as constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether that's a good idea or not in this case, but you can define those methods with extra arguments (and use those extra arguments if you call those methods explicitly). It's just that the normal syntax/implicit way to call them will not use those extra arguments. See for example LArray.__getitem__
larray/core/session.py
Outdated
else: | ||
return v1 == v2 | ||
if not equal(value, attr_def): | ||
warnings.warn(f"Cannot modify the value of the variable '{key}' declared as a constant in the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to either change the message (which sounds like the action was blocked) or raise an exception instead of using a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first tried to raise an exception but it breaks the __init__
and the load
methods.
I think raising an exception is the proper way to do it but I am not sure how to bypass this check when calling the __init__
or load
method?
The question is about the ordering of keys, right? Which ordering is lost? The one from the class or the one from the instance (when initializing from kwargs, a dict or another session)? You are right that we cannot keep both. I think we should try to maintain the class ordering and I think it should be possible to do that (but I haven't thought that much about this). |
larray/core/session.py
Outdated
def load(self, fname, names=None, engine='auto', display=False, **kwargs): | ||
super().load(fname, names, engine, display, **kwargs) | ||
for key in set(self._cls_attrs.keys()) - set(self.keys()): | ||
warnings.warn(f"The variable {key} is declared in the {self.__class__.__name__} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure this warning should exist. Creating a session out of several files seems like something we should allow without a warning. Maybe the warning should be there only if names is None???
If key was in names and it's not in the file, this should be an error (not a warning) but I suppose super().load would already catch that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not innocent though. If we allow this, we must handle the case where we request an attribute which has been declared but not loaded (yet). I am leaning towards that being a good idea (because we will need something like that for LazySession anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, to keep things simple, we might want to start without that feature and implement it later.
larray/core/session.py
Outdated
f"class definition but has not found in file {fname}.") | ||
for key in set(self.keys()) - set(self._cls_attrs.keys()): | ||
warnings.warn(f"The variable {key} has been found in file {fname} but is not declared " | ||
f"in the {self.__class__.__name__} class definition.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure this warning should exist either. It seems reasonable to me to instantiate a session with a file containing more variables than necessary. LIAM2 accepts this kind of things (if I transpose the concepts) and users have found this feature quite useful. The trick is that, in that case LIAM2 only loads the declared stuff. Maybe we should do this here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick is that, in that case LIAM2 only loads the declared stuff.
So LIAM2 only loads the declared stuff but does not print any warnings if there are undeclared stuff in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The spirit is that you declare the fields (LIAM2 vocabulary) you need in your model and can use whatever file provides those fields. This makes it possible to use the same input data files for several variants, some of which need extra fields.
larray/core/session.py
Outdated
@@ -1590,7 +1603,9 @@ def _check_key_value(self, key, value): | |||
attr_def = getattr(cls, key, None) | |||
if attr_def is None: | |||
warnings.warn("'{}' is not declared in '{}'".format(key, self.__class__.__name__), stacklevel=2) | |||
else: | |||
elif isinstance(attr_def, (type, ArrayDef)): | |||
if value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use a more explicit value for this than None because None could be a valid value. The usual trick is to create a global variable with an empty object as a unique flag:
NOT_LOADED = object()
Also, it might (I have not thought this through) be useful to pass extra/optional arguments to a few functions including _check_key_value so that you know whether or not you are currently during __init__
.
larray/core/session.py
Outdated
f"in the {self.__class__.__name__} class definition.") | ||
|
||
def save(self, fname, names=None, engine='auto', overwrite=True, display=False, **kwargs): | ||
for key in set(self._cls_attrs.keys()) - set(self.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, the first check is probably a good idea. It catches "NOT_LOADED" values, right?
However I still stand that the second check seems redundant with a warning you should have received earlier (when "adding" that variable to the session).
larray/core/session.py
Outdated
... population_obs = Array | ||
... population_proj = Array | ||
... # declare arrays with fixed axes | ||
... mortality_rate = ArrayDef((AGE, GENDER)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, even if I came up with it, I am not entirely satisfied with the name ArrayDef. I would like to find a name which reads more naturally. Something like ArrayWithSpecificAxes but preferably shorter 😉. Do you have any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it that it would be useful to provide a way for users to declare that an array must have some specific axes even if the labels of those axes are not specified yet. In the example above, this would mean the ability to write something like either:
population_obs = ArrayDef((AGE, GENDER, "year"))
or better yet (if possible -- I think it would be tricky but possible):
population_obs = ArrayDef((AGE, GENDER, obs_years))
the later might require using AxisDef instances instead of just "Axis" to be able to differentiate instances: eg:
obs_years = AxisDef("time")
population_obs = ArrayDef((AGE, GENDER, obs_years))
The goal is keep the class definition as close as possible to a complete reference of what each variable is (and also be more precise in the safety checks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder how PyCharm behaves when using "show definition" on such an attribute. Does it show the class definition or the point where it was given a value "m.obs_years = Axis(...)"? I am guessing the later, when it is defined but I don't really know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it that it would be useful to provide a way for users to declare that an array must have some specific axes even if the labels of those axes are not specified yet.
You're right but can I move this suggestion to a separate issue and implement it later? This PR is complicated enough.
One more comment. We must make it easy for the user to somehow create a ConstrainedSession from arbitrary files (in any strange format they like). Obviously in that case they need to provide a function to load those files. There are two ways to approach this:
class MyData(ConstrainedSession):
a = AxisDef()
arr1 = ArrayDef(a)
def load_stuff(filepath):
with open_excel(filepath) as wb:
arr1 = wb[...][...].load()
return MyData(a=..., arr1=arr1)
class MyData(ConstrainedSession):
a = AxisDef()
arr1 = ArrayDef(a)
def __init__(self, filepath):
with open_excel(filepath) as wb:
arr1 = wb[...][...].load()
# or is there a better way?
super().__init__(a=..., arr1=arr1) I am unsure anything needs to change in the code, but I think testing this is important (to at least know where we stand) and providing an example of whatever is the "best" way in the tutorial would be a good idea I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice to me overall. Two things worry me though: the pydantic version and the systematic copy on each setattr (AFAIU).
.travis.yml
Outdated
@@ -35,8 +35,10 @@ install: | |||
# except scipy and install pandas with --no-deps | |||
# - pandas >= 0.20.0 is required since commit 01669f2024a7bffe47cceec0a0fd845f71b6f7cc | |||
# (issue 702 : fixed bug when writing metadata using HDF format) | |||
# - pydantic: cannot define numpy ndarray / pandas obj / LArray field with default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we blocked on an older version forever or is it a temporary problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say. I opened an issue on the pydantic github repository.
pydantic
is still currently in active development but I don't how they manage their issues.
larray/core/constrained.py
Outdated
|
||
# broadcast if needed | ||
array = empty(axes=cls.expected_axes, dtype=cls.dtype) | ||
array[:] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra copy is both a good thing (some people find this more intuitive) and a bad thing (a tad less performant but more importantly could lead to unexpected errors given that the behavior is different than for "normal" sessions which only hold references to arrays). Unsure how to solve this (or if it needs solving at all) but this particular point will at least need to be documented explicitly (we possibily need to clarify the situation for normal Session too).
Also, when is validate called? In the code like below, would it make an extra copy for each line?
s.arr = zeros(...)
s.arr = s.arr + 1
If this is the case, can't we imagine a system where if the attribute was never initialized, it uses the current code path using empty, but if it was, only the check is done but not the copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I'll make several tests to see if I can come with something better than this extra copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a new version of ConstrainedArrayImpl.validate()
. I still need to create an extra copy in case of missing axes but maybe there is another way in case of missing axes to avoid to create an extra copy that I don't see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no way to add axes without copying: you cannot broadcast "inplace". FWIW, there is one exception to this, it is if one accepts the resulting array to be readonly (which .expand() can do as an option) but we clearly cannot use that option here.
cs = TestConstrainedSession() | ||
assert len(caught_warnings) == len(not_loaded_variables) | ||
for i, var_name in enumerate(not_loaded_variables): | ||
assert caught_warnings[i].message.args[0] == f"No value passed for the declared variable '{var_name}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with any solution as long as we give ourselve the opportunity to revisit this choice later. Makes me think that we might want to label the new feature "experimental" for one or two releases.
Do you know why Travis tests are no longer launched automatically in PRs ? |
No... No idea, sorry. 😞 |
Probably due to: https://blog.travis-ci.com/2021-05-07-orgshutdown |
I migrated our CI to travis-ci.com (instead of .org). The problem is we got a one-time 10000 credits (not sure what those correspond to) and they will probably run out quickly. |
closed & reopen to trigger a Travis build |
Travis is installing a very old version of pydantic (0.18.2). I need to check why. |
As a final discussion, I wonder if using a pejorative word such as I guess we need the opinion of someone else from our team this time. What do you think ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a final discussion, I wonder if using a pejorative word such as
constrained
is a good idea considering our users. Even it is not orthodox, I would consider to hide the "constrained" by simply prefixingArray
andSession
with aC
, i.e.CArray
instead ofConstrainedArray
andCSession
instead ofConstrainedSession
.I guess we need the opinion of someone else from our team this time. What do you think ?
I don't think it's pejorative. It just says what it does... but we can try to find another name if you do not like it. I am not a fan of C*. What about Fixed* or Checked*? Asking other users about this is fine with me too.
Checked* sounds the best choice to me. |
One last review and merge as it is ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Congratulations for the huge work! Please address as many of my comments as you feel are necessary and merge without asking further review.
larray/tests/test_checked_session.py
Outdated
cs['h'] = zeros_like(h) | ||
|
||
# trying to add an undeclared variable -> prints a warning message | ||
with pytest.warns(UserWarning) as caught_warnings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpler syntax or must_warn
larray/tests/test_checked_session.py
Outdated
# a) type given explicitly | ||
# -> Axis | ||
expected_error_msg = "instance of Axis expected" | ||
with pytest.raises(TypeError) as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the match argument of pytest.raises to check the error message directly, for this test and all (or at least most) subsequent calls to pytest.raises below. Don't forget to re.escape() the message (or provide a must_raise helper which does it, similar to must_warn)
|
||
names = session.filter(kind=Array).names | ||
names = Session({k: v for k, v in session.items() if isinstance(v, Array)}).names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you use filter anymore?
9b10387
to
a2d8862
Compare
given the test failures, I guess you need to introduce a "needs_pydantic" decorator |
4c094ee
to
48ca515
Compare
It worked. Thanks. |
4e6662e
to
8fa6187
Compare
@gdementen OK with the changelog ? (last commit) |
@alixdamman: sure. I personally wouldn't explain what autocompletion is but... 🤷♂️ |
41bb5f0
to
0324475
Compare
…s and CheckedArray classes (based on preliminary code written by gdementen)
0324475
to
abe84f4
Compare
I know the name
FrozenSession
is misleading because values of array items can be modified but I couldn't find another better name to emphasis that axes of arrays and all non array items are 'frozen'.I know that in your version of the
StaticSession
class you choose to let users to add or remove items and to print warnings message when there is a missing or extra item. I know you made that choice partly to not afraid present and potential users to use this new feature of LArray.But personally, I can't convince myself that users take any care of warnings messages. I really can't.
Letting them to add and remove items "on the fly" is way too dangerous and this partially breaks the main purpose of this new kind of Session, i.e. relying (safely) on 'autocomplete' to write models.