Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom DNS addresses when accessing internal resources from the external world (like guacamole) #3731

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tamirkamara
Copy link
Collaborator

What is being addressed

When we access TRE resources from the external world we use their own DNS names which looks odd - it includes many random parts that might be recognized as risky sites by users and browser protection schemes.
A prime example for this is the Guacamole workspace service which is accessed from the internet by its own app-service URL.

How is this addressed

  • Route traffic via the TRE application gateway.
  • Install a new shared service to control the application gateway configuration dynamically when needed by another service (similar to what we do with the firewall).
    The install action simply imports its current configuration to the state since it's installed in the core section.
  • Update Guacamole and VM to use this feature.

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

Unit Test Results

0 tests   0 ✔️  0s ⏱️
0 suites  0 💤
0 files    0

Results for commit 1f7a985.

♻️ This comment has been updated with latest results.

@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 07:55 — with GitHub Actions Inactive
@github-actions
Copy link

github-actions bot commented Oct 4, 2023

E2E Test Results

25 tests  ±0   25 ✔️ ±0   8s ⏱️ ±0s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ±0 

Results for commit 46e12648. ± Comparison against base commit 1143f2a.

♻️ This comment has been updated with latest results.

@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:03 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara temporarily deployed to CICD October 4, 2023 08:10 — with GitHub Actions Inactive
@tamirkamara tamirkamara force-pushed the tamirkamara/guacamole-appgw branch from 570eca9 to 1f7a985 Compare October 7, 2023 06:00
@marrobi
Copy link
Member

marrobi commented Oct 10, 2023

Tested this today and seemed to work well. Going to do a bit more digging and will review properly.

@marrobi
Copy link
Member

marrobi commented Jan 4, 2024

I've come back to this, and been having a look at Entra ID application proxy..

https://learn.microsoft.com/en-us/entra/identity/app-proxy/what-is-application-proxy

Got it working.

image

Need to consider the two options.

@SvenAelterman
Copy link
Collaborator

I've come back to this, and been having a look at Entra ID application proxy..

https://learn.microsoft.com/en-us/entra/identity/app-proxy/what-is-application-proxy

Got it working.

image

Need to consider the two options.

How is the Application Proxy URL better than the native App Service URL?

@SvenAelterman SvenAelterman changed the title Use DNS addresses when accessing internal resources from the external world (like guacamole) Use custom DNS addresses when accessing internal resources from the external world (like guacamole) Jan 12, 2024
@marrobi
Copy link
Member

marrobi commented Jan 12, 2024

I've come back to this, and been having a look at Entra ID application proxy..
https://learn.microsoft.com/en-us/entra/identity/app-proxy/what-is-application-proxy
Got it working.
image
Need to consider the two options.

How is the Application Proxy URL better than the native App Service URL?

That's a good question :), I was thinking about having a corporate domain with single SSL, it also means we are using a managed authentication solution, rather than relying on a 3rd party OSS component.

@SvenAelterman
Copy link
Collaborator

I've come back to this, and been having a look at Entra ID application proxy..
https://learn.microsoft.com/en-us/entra/identity/app-proxy/what-is-application-proxy
Got it working.
image
Need to consider the two options.

How is the Application Proxy URL better than the native App Service URL?

That's a good question :), I was thinking about having a corporate domain with single SSL, it also means we are using a managed authentication solution, rather than relying on a 3rd party OSS component.

Would it be TRE's responsibility to set that up though? Or would the customer make the decision to set up Application Proxy and configure a custom domain, etc.?

@marrobi
Copy link
Member

marrobi commented Jan 12, 2024

The thing triggered me in at app proxy was the fact app gw limits to 100 back ends or rules so isn't as scalable as we might need.

It also potentially avoids the need for an app gw shared service.

Is a wider discussion here around what's the issue we are trying to resolve.

@jonnyry
Copy link
Collaborator

jonnyry commented Jul 15, 2024

In order to get around the max 100 backend pools, could Guacamole be promoted to a shared service?

@marrobi
Copy link
Member

marrobi commented Nov 15, 2024

@tamirkamara just to confirm, this has TF that modifies the same App GW in two places? But ignores changes on the backend pool. Might be cleaner to have one app gw for core, and one in shared for shared and workspace services? The core one remains static.

For the 100 backend pool limit, could we potentially deploy and app gw per 100 rules, and do 1-100 on one, 101-199 on the next etc. The rules per app gw could be configurable as might be we get to 50 and realise we need another?

@jonnyry welcome your thoughts on this, as know is top of mind.

@jonnyry
Copy link
Collaborator

jonnyry commented Nov 15, 2024

Hi @marrobi

That sounds like plan to overcome the 100 backend limit.

In terms of the relationship between the two App Gateway, were you thinking something like below with one being nested behind the other? (Nb. the /guac0-99 etc URL paths on the first AGW are just as an example)

image

@marrobi
Copy link
Member

marrobi commented Nov 15, 2024

That probably makes most sense. Would welcome @tamirkamara's thoughts.

We might also look at putting the airlock storage behind the core app gw.

Would be interesting to think about how shared services, like Nexus could sit behind it too.

@tamirkamara
Copy link
Collaborator Author

tamirkamara commented Nov 15, 2024

I understand how this can overcome the 100 routes limit. Not sure I see how does it separate the need to make changes in the core's appgw... i.e. how does the guac0-99 route gets created?
p.s. I need to remember what exactly I did here :-)

@marrobi
Copy link
Member

marrobi commented Nov 15, 2024

I understand how this can overcome the 100 routes limit. Not sure I see how does it separate the need to make changes in the core's appgw... i.e. how does the guac0-99 route gets created? p.s. I need to remember what exactly I did here :-)

Yes, you are right, missed that. We need to have a think.

@jonnyry
Copy link
Collaborator

jonnyry commented Nov 19, 2024

I understand how this can overcome the 100 routes limit. Not sure I see how does it separate the need to make changes in the core's appgw... i.e. how does the guac0-99 route gets created? p.s. I need to remember what exactly I did here :-)

Yes, you are right, missed that. We need to have a think.

Ah yes I missed this also.

Few ideas -

1. Add an extra app gatway into the Guac-specific App Gateway Shared Service:

(though not sure how I feel about three tiers of App Gateways)

image

2. Bring the core app gateway into the App Gateway Shared Service:

image

3. Switch out the Guac shared App Gateway for NGINXAAS and use that instead

Pretty sure it doesn't have a hard backend limit.

@SvenAelterman
Copy link
Collaborator

If choosing the NGINXaaS path, let's verify it's available in Azure Gov.

@tamirkamara
Copy link
Collaborator Author

Thanks for the input but I'm still not following as all the suggestions are still requiring a new route on the primary appgw once the shared service is installed (with whatever router inside). That still means we need to configure it from 2 places like in my original suggestion.
I know you all try to resolve the 100 routes limit, but is it even a realistic situation in the deployments we are aware of?

@jonnyry
Copy link
Collaborator

jonnyry commented Dec 1, 2024

Thanks for the input but I'm still not following as all the suggestions are still requiring a new route on the primary appgw once the shared service is installed (with whatever router inside). That still means we need to configure it from 2 places like in my original suggestion.

The Add an extra app gatway into the Guac-specific App Gateway Shared Service option above doesn't require dynamic routes adding to the primary appgw? Unless there's something I'm misunderstanding about Guacamole routing.

The Bring the core app gateway into the App Gateway Shared Service option does - agreed - however the primary appgw is intentionally brought inside the shared service so its routes can be configured dynamically.

I know you all try to resolve the 100 routes limit, but is it even a realistic situation in the deployments we are aware of?

I would be happy with the change as it is (inc the 100 routes limit) for now, if the a ticket for a > 100 solution was added to the backlog for future resolution. IIUC there are factors restricting the number of workspaces (e.g. storage accounts) at the moment anyway.

@tamirkamara
Copy link
Collaborator Author

The Add an extra app gatway into the Guac-specific App Gateway Shared Service option above doesn't require dynamic routes adding to the primary appgw? Unless there's something I'm misunderstanding about Guacamole routing.

I meant that we need to configure the /guac route on the core install before the shared service appgw is created. There might be an option to do that but I didn't investigate...

@marrobi
Copy link
Member

marrobi commented Dec 9, 2024

As another option why not have a dedicated DNS name and IP for shared services App GW? Other services, say AML, Databricks etc all have their own FQDN.

@tamirkamara
Copy link
Collaborator Author

As another option why not have a dedicated DNS name and IP for shared services App GW? Other services, say AML, Databricks etc all have their own FQDN.

Not sure I understood the point around the other services?

@marrobi
Copy link
Member

marrobi commented Dec 9, 2024

Why are we trying to make the traffic go through the core app gw?

@tamirkamara
Copy link
Collaborator Author

Why are we trying to make the traffic go through the core app gw?

[continuing the question]... and not through another appgw specifically created for shared services, e.g. Guacamole?

I can think of the following: each appgw requires its own dns entry and https certificate (which is operational and not that difficult to solve), plus, we don't have the address space to allocate it IIRC.

Finally, the current core appgw doesn't do much so why not reuse it and not double the cost.
Overall seeing a single solution with multiple entry points managed internally (vs mandated by Azure services) seem odd. But that's just my opinion.

@jonnyry
Copy link
Collaborator

jonnyry commented Dec 9, 2024

Finally, the current core appgw doesn't do much so why not reuse it and not double the cost. Overall seeing a single solution with multiple entry points managed internally (vs mandated by Azure services) seem odd. But that's just my opinion.

Agree... a single entry point and domain name makes the most sense. Reduces complexity around certificate renewal, WAF policies, traffic management etc.

@jonnyry
Copy link
Collaborator

jonnyry commented Dec 9, 2024

I know you all try to resolve the 100 routes limit, but is it even a realistic situation in the deployments we are aware of?

I would be happy with the change as it is (inc the 100 routes limit) for now, if the a ticket for a > 100 solution was added to the backlog for future resolution. IIUC there are factors restricting the number of workspaces (e.g. storage accounts) at the moment anyway.

@marrobi wondering what your thoughts are on accepting the PR with the 100 limit as it is? (and creating a backlog task to extend the limit later)

@marrobi
Copy link
Member

marrobi commented Dec 9, 2024

I want to have a plan forward for the 100 limit, even if we don't tackle it now. Otherwise when it hits it's going to be a scenario where people are saying they are blocked and we don't have a way out. For example #3920 (comment) .

Could the first X workspaces use the core app gw, then above that limit use another maybe?

@jonnyry
Copy link
Collaborator

jonnyry commented Dec 9, 2024

I want to have a plan forward for the 100 limit, even if we don't tackle it now. Otherwise when it hits it's going to be a scenario where people are saying they are blocked and we don't have a way out. For example #3920 (comment) .

Could the first X workspaces use the core app gw, then above that limit use another maybe?

Still would prefer for traffic to flow through a single entry point, however we could live with that since it would only mean an additional domain name at each 100 workspace increment.

@tamirkamara
Copy link
Collaborator Author

tamirkamara commented Dec 12, 2024

I had another thought and I think this will be possible:

  1. We will add a few new routes to the core appgw while deploying it. The routes will target private IPs of internal application gateways that weren't created yet and we'll need to calculate their future IP assignment (it's possible).
  2. A new shared service will handle this internal traffic in a similar way to what is done in this PR.
  3. The shared service will include 1 or more private-only application gateways (this is a preview feature) that will get the private IPs from point 1 and at this point the routing from core is functional.
  4. The above supports multiple appgw meaning there's a way for 100+ workspaces. However it requires some advanced TF stuff and persistence, which we could handle at a later time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants