What to use for PostGIS Data Warehouse? #209
-
Hello everyone 👋 I am working on building a Data Warehouse for my company. As we use extract many different spatial datasets of hundreds of different sources the selection of the data storage engine was Postgres + PostGIS. Now I have already build several data products in my data warehouse and I am currently working on the task of adding CityGML files from governmental websites into the data warehouse. As I am only using Open SOurce software (dlt for data retrieval, dbt for data transformation and dagster for orchestration) I am wondering how to integrate the data of CityGML into my data warehouse without using the Importer-Exporter GUI of 3DCityDB. Mainly I am interested in data that I can retrieve from the 3D models than the actual 3D model itself. So for instance, I want to get the slope angle and slope direction of the roof as well as the height or foot print area of the building. On my research I found the CityJSON project and I think your appraoch for a more efficient parsing of the cityGML files through a JSON format is exactly what I am searching for. So here are my actual questions:
Would be happy to hear your thoughts. And thanks for creating such a great tool! Cheers from Germany, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Hello, welcome to the CityJSON project :-)!
It is active and will stay so. It is quite widely known and used already. CityJSON is an OGC community standard (if that tells you anything). The latest version, CityJSON 2 implements CityGML 3.0.
If you need to compute the required geometry statistics from the 3D model, as opposed to getting it from an existing attribute, then you need to calculate them from the model before loading it to postgres (eg. in Python as you said).
No. 3DCityDB create very heavy databases, as it (over-)normalizes everything in the citymodel. So you'll end up lot's of tables and lots of joins to do simple operations. In case you haven't seen it, there may be some software here that could help you. |
Beta Was this translation helpful? Give feedback.
-
Obviously, I cannot agree to this at all :-) The 3DCityDB v4 uses a simpe feature-to-table mapping approach which follows standard relational modelling practices and is really easy once you understood it. And in contrast to cjdb, every attribute and geometry is directly available. So you could easily use database functionality for your spatial analysis and calculations. The schema for the upcoming 3DCityDB v5 (under development) is even more simple. Whether the 3DCityDB is a choice in your development is another question. If it's just about collecting statistics, it might be easier to do this file-based (either CityJSON or CityGML). |
Beta Was this translation helpful? Give feedback.
-
@clausnagel @balazsdukai Wow, thanks for your replies. Didn't expect getting them on a Saturday morning 😄 I studied Geoinformation Science at the TU Berlin (written my Master THesis with Mr. Adolphi :) ), so I was quite versatile with the CityGML format, but that was almost a decade ago. 😅 And since then, I haven't worked with CityGML anymore. And now that I have both of you here I use my chance and explain you my use case in a little more detail: I am working in a consulting company and am responsible to modernize the data management. One task is to create a centralized data warehouse which is then used by business analysts for BI, ML scientist for building models and developers to integrate into our software products. One important feature is our building model, which includes several statistics of a building for things like heating potential, solar potential or broadband connectivity. Now the scripts that currently create these statistics (and which I did not look through yet) use (among other data) the LoD2 data provided by German states. However, these scripts are pretty error prone and unorganized and one of my tasks is to automate the procedure (from data retrieval to the final data product) including clean up the scripts and integrate the results into the DWH. So I researched online a bit and came across the CityJSON format, which seems to solve the issue of cluttered XML parsing scripts and a more efficient way of getting my statisics out of the geometry data. For instance, for solar potential the roof slope and roof direction or for heating potential the height of the building. This then will be integrated into the DWH. However, there could be more use cases coming up, where other features may need to be derived from the geometries. For this scenario, I would like to have the geometries stored in a centralized place as well, preferably the already existent PostGIS data warehouse, so that people could take these geometries and derive their information directly from there. And as we have colleagues who may be more versatile in only one language (SQL or Python) it would be great to a able them to work with both languages. So based on your replies I would go with the following strategy and would love to hear your opinion about it. I retrieve the data automatically with my orchestration tool dagster and use the CLI tool of the importer/exporter of 3DCityDB to ingest it into the 3DCityDB extention installed in my data warehouse. In parallel I convert the CityGML into CityJSON and derive statistics out of it and integrate it as a normal table into my data warehouse by joining it with the 3DCityDB instance of the dataset. Then running final data tests and serve the final data product to end users. They can use the importer/exporter again to get the data into a JupyterNotebook or Python script or I convert it again back to CityJSON to serve it in a API for our developers to retrieve it into our software products. My question is if I make it more complicated that I should with this setup. Perhaps you have some tips for me :) Because, as @clausnagel mentioned, if I can just integrate the CityGML directly and derive my statistics with SQL code like PostGIS form the 3DCityDB directly it would be the best scenario for me, as I can keep my current DWH logic. |
Beta Was this translation helpful? Give feedback.
Hello,
welcome to the CityJSON project :-)!
It is active and will stay so. It is quite widely known and used already. CityJSON is an OGC community standard (if that tells you anything). The latest version, CityJSON 2 implements CityGML 3.0.