5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501

thehanggit · 2024-12-18T17:13:45Z

The motivation, goal, and step-by-step process is illustrated in Snowflake Notebook. In general, we are comparing the differences of detector 5-minute aggregation tables in the clearing house as these tables serve as the foundation for all downstream models.

Overall, 93.20% of the data from districts 6, 8, 12 is identical. The remaining differences (6.8%) would be identified and categorized as follows.

0.67% Attributed to rounding bias, which can be considered as identical data
3.18% Potentially caused by data relay loss
2.95% Currently unexplained, but extreme volume values (over 1000) observed in modernized PeMS data may provide clues for further investigation.

kengodleskidot · 2024-12-18T17:48:32Z

Thank you for the analysis @thehanggit. I believe the extreme volume values will be addressed through the high flow value issue described in #278 which has been backlogged. Once a fix for the high flow values is implemented it would be interesting to see how that impacts the 2.95% of currently unexplained differences.

thehanggit · 2024-12-18T18:06:11Z

@kengodleskidot

Thank you for the analysis @thehanggit. I believe the extreme volume values will be addressed through the high flow value issue described in #278 which has been backlogged. Once a fix for the high flow values is implemented it would be interesting to see how that impacts the 2.95% of currently unexplained differences.

Got you, which means these extreme values are observed instead of by normalization or other postprocessing, whereas the old PeMS system dealt with this issue in their 5-minute table.

kengodleskidot · 2024-12-18T19:02:11Z

@kengodleskidot

Thank you for the analysis @thehanggit. I believe the extreme volume values will be addressed through the high flow value issue described in #278 which has been backlogged. Once a fix for the high flow values is implemented it would be interesting to see how that impacts the 2.95% of currently unexplained differences.

Got you, which means these extreme values are observed instead of by normalization or other postprocessing, whereas the old PeMS system dealt with this issue in their 5-minute table.

You are correct, the high flow values are being reported directly by the devices in the raw data. There may be instances where normalization results in high values, but I suspect that is very rare. I believe once a high flow value threshold methodology is determined at the appropriate level (detector, station, etc.) the flow value would be replaced by either an imputed flow value or a max flow value that would need to be determined at the same level. There is no documentation that I am aware of that details how existing PeMS handles high flow values.

thehanggit · 2024-12-19T00:28:06Z

@kengodleskidot

Thank you for the analysis @thehanggit. I believe the extreme volume values will be addressed through the high flow value issue described in #278 which has been backlogged. Once a fix for the high flow values is implemented it would be interesting to see how that impacts the 2.95% of currently unexplained differences.

Got you, which means these extreme values are observed instead of by normalization or other postprocessing, whereas the old PeMS system dealt with this issue in their 5-minute table.

You are correct, the high flow values are being reported directly by the devices in the raw data. There may be instances where normalization results in high values, but I suspect that is very rare. I believe once a high flow value threshold methodology is determined at the appropriate level (detector, station, etc.) the flow value would be replaced by either an imputed flow value or a max flow value that would need to be determined at the same level. There is no documentation that I am aware of that details how existing PeMS handles high flow values.

That is clear enough! Thank you Ken. I will continue the analysis to find potential reasons for differences. Hope we can explain every piece of them!

thehanggit added this to the Data Quality Checks milestone Dec 18, 2024

thehanggit assigned thehanggit, kengodleskidot, ZhenyuZhu-Caltrans and mmmiah Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501

5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501

thehanggit commented Dec 18, 2024

kengodleskidot commented Dec 18, 2024

thehanggit commented Dec 18, 2024

kengodleskidot commented Dec 18, 2024

thehanggit commented Dec 19, 2024

5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501

5-minute Aggregation Datasets Comparison Between Old and Modernized PeMS System --- A Case Study #501

Comments

thehanggit commented Dec 18, 2024

kengodleskidot commented Dec 18, 2024

thehanggit commented Dec 18, 2024

kengodleskidot commented Dec 18, 2024

thehanggit commented Dec 19, 2024