Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the reporting of uncertainties in the calculation made by boaviztapi #214

Open
PierreRust opened this issue Sep 11, 2023 · 6 comments

Comments

@PierreRust
Copy link
Contributor

Problem

I believe we should improve the way we handle uncertainty in the calculation made in boaviztapi. This issue is mainly meant to start the discussion on how that could be improved.

When using the API to request the impact of a server, vm etc., figures are given with a high number of digits, a significant_figures parameter, and a min and max value.

For example :

    "gwp": {
      "embedded": {
        "value": 636.11,
        "significant_figures": 5,
        "min": 252.48,
        "max": 2010.6,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },

Based on the discussion I had with @da-ekchajzer on mattermost

  • the min and max value are calculated based on the uncertainties with have on the configuration of the equipment. For example, for a server if the request did not specify the exact amount of RAM we calculate the impact for a high and a low amount of RAM.
  • the uncertainties coming from the reference data is not represented. e.g. we could consider that the value of the impact of 1GB of RAM or 1mm2 of die has an uncertainty of 10%, but we do not account explicitly for that in the result.
  • the value of the significant_figures is not really calculated but mostly comes from a configuration file.

I the example above, I think the value 636.11 does not really have 5 significant digits and, given the min and max the api returns for the impact, we should apply some rounding. We can probably say that the gwp impact for this server is around 600 kgCO2e, but 636,11 is clearly too precise ;)

Solution

Regarding the rounding I suggest using something like the function bellow (rough code, needs polishing !) : it rounds the value based on the delta between the min and max value returned and a precisionparameter (with is a %) ; if there is a large difference between min and max, the rounding is more aggressive.

For example :

  • Approx for 242.48 < 636.11 < 2016.6 precision 10% = 600
  • Approx for 242.48 < 636.11 < 2016.6 precision 1% = 640
def round_value(val, min_val, max_val, precision):
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    rounded = round(val / 10 ** significant) * 10**significant
    return rounded

Alternatives

This approach helps solving the rounding issue, something else is needed for the uncertainties coming on the references data used for the calculation.

@da-ekchajzer da-ekchajzer changed the title Improving the reporting of uncertainties in the calculation made by boavistapi Improving the reporting of uncertainties in the calculation made by boaviztapi Sep 12, 2023
@da-ekchajzer
Copy link
Collaborator

Thanks for this proportion. Using min/max and a log10 function is a great way of handling the very different figures we have. Here are my comments :

  • The rounding function is not always working. I think should keep the existing one which handle all cases :

    def round_to_sigfig(x, significant_figures):

  • Precision could be set by default from the config file and override if needed (I don't see the case for now).

def round_value(val, min_val, max_val, precision=config["default_precision"]):
    """
   Rounds the value based on the delta between the min and max value returned and a precision parameter
   """
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    return float(to_precision(val, significant))

I can make a PR with this implementation if you think it's ok.

@da-ekchajzer
Copy link
Collaborator

da-ekchajzer commented Sep 14, 2023

I believe there is an issue with the approx variable. The more difference there is between min and max, the bigger approx is and the number of significant figures. I believe we want the opposite. See some examples :

value approx min_val max_val log10(approx)
20.29217 7.0896360000000005 10.91891 81.81527 0
1819.821672 530.9138167884 110.14710120000001 5419.285269084 2
0.02040328338 2.0873039999997484e-06 0.020400523740000003 0.02042139678 -6
0.00030760589391948 0.0001208433425274 6.3406418256e-05 0.00127183984353 -4
306.0165 95.3682 179.92950000000002 1133.6115 1
61648.853641199996 224191.01528027997 62.2570572 2241972.40986 5

@PierreRust
Copy link
Contributor Author

I think the issue comes from using the significant as a parameter for to_precisionmethod . See for exemple with a value of 6.2301with min=3.617and max=10.023` (and 10% precision)

round_value (original) for 3.617 < 6.2301 < 10.023 precision 10% = 6.2
round_value (with to_precision) for 3.617 < 6.2301 < 10.023 precision 10% = 0.0

The second option is obviously wrong, it is not even in the range between min and max !

The confusion probably come from the naming of my significantvariable which is not really the number of significant figures, at least not the way it is calculated by significant_number(x) https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L9C18-L9C18

I'll make a PR with a set of test cases, it will be easier to discuss the implementation

PierreRust added a commit to PierreRust/boaviztapi that referenced this issue Sep 15, 2023
This function should be used when the API returns a value with an
associated min and max value. The rounding depends on the delta
betwwen min and max
@PierreRust
Copy link
Contributor Author

I've added a PR with a rouding function that handles corner cases : #220

@da-ekchajzer
Copy link
Collaborator

da-ekchajzer commented Sep 19, 2023

I am currently implementing the function in the code.

I think that there is a problem when min=max=value. When so, we don't round at all, even though it gives a precision which is way too high compare to the uncertainty of the impact factors.

Example

 "gwp": {
      "embedded": {
        "value": 23.77907,
        "min": 23.77907,
        "max": 23.77907,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },
      "use": {
        "value": 243.69843455999998,
        "min": 243.69843455999998,
        "max": 243.69843455999998
      },
      "unit": "kgCO2eq",
      "description": "Total climate change"
    },

Solutions

To account for the uncertainty of the impact factors, we could :

  1. Hard code a maximal sig_fig numbers
  2. Apply a ratio of x% to the maximal and minimal, which correspond to the uncertainty of the impact factor.

What are your thought about it ?

@PierreRust
Copy link
Contributor Author

yes, as the rounding is based on the difference between min and max, when min == max it does nothing (and it's by design ;) )

I believe the "right way" to handle this would be to specify an uncertainty on the base impact factor and propagate it when calculating the overall impact (the uncertainties python library could help here https://pythonhosted.org/uncertainties/ ) . This should probably be another issue and PR, for the next version.

For now, I think we should fall back, in this specific case, to the current method where we simply cut after a fixed number of sig_fig.

da-ekchajzer added a commit that referenced this issue Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants