-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501
Comments
Thank you for the analysis @thehanggit. I believe the extreme volume values will be addressed through the high flow value issue described in #278 which has been backlogged. Once a fix for the high flow values is implemented it would be interesting to see how that impacts the 2.95% of currently unexplained differences. |
Got you, which means these extreme values are observed instead of by normalization or other postprocessing, whereas the old PeMS system dealt with this issue in their 5-minute table. |
You are correct, the high flow values are being reported directly by the devices in the raw data. There may be instances where normalization results in high values, but I suspect that is very rare. I believe once a high flow value threshold methodology is determined at the appropriate level (detector, station, etc.) the flow value would be replaced by either an imputed flow value or a max flow value that would need to be determined at the same level. There is no documentation that I am aware of that details how existing PeMS handles high flow values. |
That is clear enough! Thank you Ken. I will continue the analysis to find potential reasons for differences. Hope we can explain every piece of them! |
The motivation, goal, and step-by-step process is illustrated in Snowflake Notebook. In general, we are comparing the differences of detector 5-minute aggregation tables in the clearing house as these tables serve as the foundation for all downstream models.
Overall, 93.20% of the data from districts 6, 8, 12 is identical. The remaining differences (6.8%) would be identified and categorized as follows.
The text was updated successfully, but these errors were encountered: