-
Notifications
You must be signed in to change notification settings - Fork 61
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add service canary documents * Delete empty line * Use consistent arch in operator test * Add multiple canaries guide * Refine some stuff * Correct 6th diagram of paper * Add canary chosen table * Add newline in table * Make team name more clear
- Loading branch information
Showing
21 changed files
with
315 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
|
||
# Multiple Canaries Guide | ||
|
||
- [Multiple Canaries Guide](#multiple-canaries-guide) | ||
- [Background](#background) | ||
- [Local Canary](#local-canary) | ||
- [Global Canary](#global-canary) | ||
- [Practical Guide](#practical-guide) | ||
- [Explicitly Exclusive Traffic Rules](#explicitly-exclusive-traffic-rules) | ||
- [Choose Zero or One Canary](#choose-zero-or-one-canary) | ||
- [FAQ](#faq) | ||
|
||
## Background | ||
|
||
Canary deployment normally aims to test new features of services with a specific part of the traffic in the production environment. And the feature will be evaluated in multi-dimensions such as online errors, performance or business feedback from users, etc. | ||
|
||
In the concept, the canary sounds simple, but in the real world, we often need to handle more complicated things, for example: | ||
|
||
- Test and choose one decision from multiple candidates of new features, which means that multiple canaries are running for one service technically. | ||
- Simultaneously, a service has multiple canaries testing for different features. | ||
- The situations above expanded across multiple services. | ||
|
||
In a simple summary, we need to explicitly handle how to run multiple canaries in one or more services, and constitute the relationship among them. | ||
|
||
## Local Canary | ||
|
||
First of all, we start with the simplest situation named `local canary`: If a feature only needs one service to deploy a canary, then we call this canary `local canary`. Please notice there could be many `local canary` at the same time, even within one service. So we could make a definition: `local canary` is the feature only requiring one service to deploy canary. | ||
|
||
The local canary itself is very simple, but when it comes to the relationship between them we need to be careful. So we will use an evolutionary way to illustrate its points. | ||
|
||
As the basic example at (1), we have 3 services to represent backend services of an order takeaway app, and primary traffic means all traffic except canary traffic: | ||
|
||
![image](imgs/multiple-canaries-guide-01.png) | ||
|
||
Then at (2), `the delivery team` deployed a local canary `delivery-beijing` to test a new feature for traffic from Beijing. | ||
|
||
![image](imgs/multiple-canaries-guide-02.png) | ||
|
||
And we go to (3), another team `the restaurant team` deployed another local canary `restaurant-beijing` to test another new feature. So if the two canaries handled some or all common traffic, the clients might get unexpected results. For example, `restaurant-beijing` returned a cook duration but `delivery-beijing` returned a delivery duration, where the sum of two separated durations is not consistent with the original total duration. This kind of confusing situation isn’t absolutely what we want to appear. | ||
|
||
![image](imgs/multiple-canaries-guide-03.png) | ||
|
||
So the situation illustrated (4) is what we expect in normal scenarios. We need to explicitly split Beijing traffic into two parts respectively going through two different canaries. There are different solutions in different environments, we will demonstrate one later. | ||
|
||
![image](imgs/multiple-canaries-guide-04.png) | ||
|
||
As the evolution shows, we can tell it’s unsafe that local canaries share some part of traffic. In other words technically: Local canaries do not call each other. | ||
|
||
## Global Canary | ||
|
||
Based on local canary, the term global canary is pretty clear, it is for the feature that needs multiple services to respectively deploy one release to support one canary. So we need global canary to: | ||
|
||
- Test a feature involving multiple services. | ||
- Transfer traffic through service instances which belongs to the same global canary. | ||
|
||
Along with the local canary example, we evolve it with global canary: | ||
|
||
![image](imgs/multiple-canaries-guide-05.png) | ||
|
||
The principles of global canary evolved from local canary is almost the same: It can’t share traffic with other local or global canaries. But as a global canary, it needs one more principle: It needs to call the same global release of another service if there is, otherwise the primary release can be just the choice. If the traffic choice violates the principles, it can’t get the whole part of the feature, or even it could get unsafe behavior. | ||
|
||
When we reach here, we can tell clearly: **Local canary is just a special case of global canary**. So we can conclude 3 core principles here for multiple canary deployments: | ||
|
||
1. The traffic rules of choosing the canary are explicitly exclusive. | ||
2. The complete chain of a request goes one canary at most. | ||
3. Normal traffic not matching canary rules must go through primary deployments. | ||
|
||
## Practical Guide | ||
|
||
Before jumping into the practice, we should define terms to make the words more fluent. | ||
|
||
- Color: we use the word to refer to give a request a specific tag, which also means the canary it belongs to under the context. | ||
|
||
### Explicitly Exclusive Traffic Rules | ||
|
||
To satisfy this goal, we just need to color plain/uncolored traffic in the endpoint under dedicated rules. For example, we use priority as an integer to represent the coloring order, where the number is lower, the priority is higher. So back to the local canary example, we assigned priority 4 to `restaurant-beijing`, 5 to `delivery-beijing`. So the default choice of Beijing traffic will be `restaurant-beijing`. Besides the so-called default choice, if the traffic itself has been already colored in advance, it will go its own canary regardless of the traffic rules. | ||
|
||
![image](imgs/multiple-canaries-guide-06.png) | ||
|
||
We suspect you will ask what if they got the same priority, the solution for it could be varied. You can forbid assigning the same priority, or give canaries under the same priority a second level explicit order such as ordering alphabetically. | ||
|
||
### Choose Zero or One Canary | ||
|
||
based on the practice above, we could just need to guarantee the traffic can’t be recolored in the whole path, which means its color can only be initialized but not changed. For example, if we used an HTTP header `X-Canary-Choice` to represent the color. Every endpoint in the chain must not change its value if there already has been one. | ||
|
||
So until now, we could write a simple snippet of pseudocode to explain it in a technical way: | ||
|
||
```go | ||
canary := request.Headers[“X-Canary-Choice”] | ||
|
||
if canary != “” { | ||
sendRequestToCanary(request, canary) | ||
} else { | ||
canary := chooseCanary(request) | ||
if canary != “” { | ||
request.Headers[“X-Canary-Choice”] = canary | ||
sendRequestToCanary(request, request.Canary) | ||
} else { | ||
sendRequestToPrimary(request) | ||
} | ||
} | ||
``` | ||
|
||
And the complete examples of the alogorithm `chooseCanary` could be like: | ||
|
||
| Traffic | Delivery Canary Table <br>(Priority, Traffic Rules, Color) | Decision | Strategy | | ||
| :-----------------------------------: | ------------------------------------------------------------------- | :-----------------------: | :------------------------------------------------: | | ||
| Beijing | 1, Beijing, Green<br>2, Beijing, Blue | Green | Base on Priority | | ||
| Beijing&Android<br>Beijing<br>Android | 1, Beijing, Green<br>2, Android, Yellow | Green<br>Green<br>Yellow | Base on Priority<br>Base on Rules<br>Base on Rules | | ||
| Beijing&Android<br>Beijing<br>Android | 1, Android, Yellow<br>2, Beijing&Android, Blue<br>3, Beijing, Green | Yellow<br>Green<br>Yellow | Base on Priority<br>Base on Rules<br>Base on Rules | | ||
|
||
finnally please notice the performance cost in selecting canary while it has many canaries. The administration had better set a limitation number for canaries, such as 5. | ||
|
||
## FAQ | ||
|
||
- What canary policy should be chosen? | ||
|
||
It depends on the real business, the common ways could be: | ||
|
||
On percentage: it is the simplest policy, but the inconsistent behavior for users may decrease user experience. | ||
On client devices or geographical region, etc: it is stabler than percentage policy, but not stable as users policy. | ||
On users: it is more complex than the percentage policy, but it has more precise control on testing such as VIP users have higher priority to use the new canary feature. | ||
|
||
- Which component is better to color traffic? | ||
|
||
The moderate way is to use API gateway (the traffic entry) to support configurable canary coloring rules. It needs an API gateway to easily integrate new features. But as a self-contained solution, every endpoint should have the ability to color traffic in general ways. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
# Service Canary User Manual | ||
|
||
- [Service Canary User Manual](#service-canary-user-manual) | ||
- [Quick Start](#quick-start) | ||
- [Config Explained](#config-explained) | ||
- [Another Service Canary](#another-service-canary) | ||
- [Service Canary Across Multiple Services](#service-canary-across-multiple-services) | ||
- [Safety](#safety) | ||
|
||
EaseMesh uses service canary to define rules of [canary release](https://martinfowler.com/bliki/CanaryRelease.html) for mesh services. | ||
|
||
## Quick Start | ||
|
||
We use 3 services to present a demonstration of a takeaway app, plus a delivery canary release to add a new feature that returns road duration. | ||
|
||
![image](./imgs/service-canary-01.png) | ||
|
||
1. Apply takeaway app config: | ||
|
||
```bash | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_tenant.yaml | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_order.yaml | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_restaurant.yaml | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_delivery.yaml | ||
|
||
|
||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_mesh_namesapce.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_order.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_restaurant.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_delivery.yaml | ||
``` | ||
|
||
2. Try primary traffic | ||
|
||
```bash | ||
# Get order public node port. | ||
$ kubectl get -n mesh-service service order-mesh-public | ||
$ curl http://{node_ip}:{order_public_port}/ -d '{"order_id": "abc1234", "food": "bread"}' | ||
order_id: abc1234 | ||
restuarant: | ||
delivery_time: 2021-12-07T13:12:14 | ||
food: bread | ||
order_id: abc1234 | ||
``` | ||
|
||
3. Add canary of delivery | ||
|
||
```bash | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_delivery_beijing.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_delivery_beijing.yaml | ||
|
||
$ curl http://127.0.0.1:32539/ -d '{"order_id": "abc1234", "food": "bread"}' -H 'X-Location: Beijing' | ||
order_id: abc1234 | ||
restuarant: | ||
delivery_time: '2021-12-07T13:22:47 (road duration: 7m)' | ||
food: bread | ||
order_id: abc1234 | ||
``` | ||
|
||
## Config Explained | ||
|
||
Actually, We just introduce a new definition to describe service canary in [delivery_beijing.yaml](https://github.com/megaease/easemesh-demo/blob/main/deploy/mesh/easemesh_delivery_beijing.yaml): | ||
|
||
```yaml | ||
apiVersion: mesh.megaease.com/v1alpha1 | ||
kind: ServiceCanary | ||
metadata: | ||
name: delivery-mesh-beijing | ||
spec: | ||
priority: 5 # The range is [1, 9], default is 5, the lower number is, the priority is higher. | ||
selector: | ||
matchServices: [delivery-mesh] # What services are in the canary. | ||
matchInstanceLabels: {release: delivery-mesh-beijing} # What instance labels are in the canary. | ||
trafficRules: # What characteristics of traffic are in the scope of canary. | ||
headers: | ||
X-Location: | ||
exact: Beijing | ||
``` | ||
So this config tells EaseMesh: The traffic with header `X-Location: Beijing` will be tagged `delivery-mesh-beijing`, and it will go through instances labeled `release: delivery-mesh-beijing` of the service `delivery-mesh`. | ||
|
||
The details about the config refer to [service canary](https://github.com/megaease/easemesh-api/blob/main/v1alpha1/meshmodel.md#easemesh.v1alpha1.ServiceCanary). | ||
|
||
## Another Service Canary | ||
|
||
![image](./imgs/service-canary-02.png) | ||
|
||
Now we are deploying restaurant canary adding a feature predicting the cook duration, which is also tested for Beijing traffic. But it reuses the same header `X-Location` to identity Beijing traffic. So how EaseMesh handles the conflicts between the two canaries is to use different priorities. [delivery_beijing.yaml](https://github.com/megaease/easemesh-demo/blob/main/deploy/mesh/easemesh_restaurant_beijing.yaml) : | ||
|
||
```yaml | ||
apiVersion: mesh.megaease.com/v1alpha1 | ||
kind: ServiceCanary | ||
metadata: | ||
name: restaurant-mesh-beijing | ||
spec: | ||
priority: 4 | ||
selector: | ||
matchServices: [restaurant-mesh] | ||
matchInstanceLabels: {release: restaurant-mesh-beijing} | ||
trafficRules: | ||
headers: | ||
X-Location: | ||
exact: Beijing | ||
``` | ||
|
||
The lower number is, the priority is higher. So the traffic from Beijing will be tagged `restaurant-mesh-beijing` instead of `delivery-mesh-beijing`. So if the priority of `restaurant-mesh-beijing` was 6, `delivery-mesh-beijing` will be the one. | ||
|
||
Multiple canaries with the same priority will be sorted alphabetically in matching, but the user had better rely on the priority instead of the name in order to get explicit results. | ||
|
||
So after understanding the mechanism, we could apply the config: | ||
|
||
```bash | ||
$ emctl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/easemesh_restaurant_beijing.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_restaurant_beijing.yaml | ||
$ curl http://127.0.0.1:32539/ -d '{"order_id": "abc1234", "food": "bread"}' -H 'X-Location: Beijing' | ||
order_id: abc1234 | ||
restaurant: | ||
delivery_time: '2021-12-07T15:11:33 (cook duration: 5m)' | ||
food: bread | ||
order_id: abc1234 | ||
``` | ||
|
||
Now Beijing traffic will go through the new restaurant canary, which doesn't get mixed with delivery canary without confusion. What if you want it to go through delivery canary, adding the header `X-Mesh-Service-Canary: delivery-mesh-beijing` will get the last result as expected. | ||
|
||
## Service Canary Across Multiple Services | ||
|
||
![image](./imgs/service-canary-03.png) | ||
|
||
So what if delivery and restaurant need to be tested for a new feature together. We prepare a feature that restaurant returns coupon if delivery returns the delivery time is beyond the deadline. | ||
|
||
```bash | ||
$ emctl apply -f https://github.com/megaease/easemesh-demo/raw/main/deploy/mesh/easemesh_android.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_delivery_android.yaml | ||
$ kubectl apply -f https://raw.githubusercontent.com/megaease/easemesh-demo/main/deploy/mesh/k8s_restaurant_android.yaml | ||
curl http://127.0.0.1:32539/ -d '{"order_id": "abc1234", "food": "bread"}' -H 'X-Phone-Os: Android' | ||
order_id: abc1234 | ||
restaurant: | ||
coupon: $5 | ||
delivery_time: 2021-12-07T16:54:01 | ||
food: bread | ||
order_id: abc1234 | ||
``` | ||
|
||
The config shows it very clearly: | ||
|
||
```yaml | ||
apiVersion: mesh.megaease.com/v1alpha1 | ||
kind: ServiceCanary | ||
metadata: | ||
name: refund-android | ||
spec: | ||
priority: 5 | ||
selector: | ||
matchServices: [restaurant-mesh, delivery-mesh] | ||
matchInstanceLabels: {release: refund-android} | ||
trafficRules: | ||
headers: | ||
X-Phone-Os: | ||
exact: Android | ||
``` | ||
|
||
The details about the config refer to [service canary](https://github.com/megaease/easemesh-api/blob/main/v1alpha1/meshmodel.md#easemesh.v1alpha1.ServiceCanary). | ||
|
||
## Safety | ||
|
||
We formulate some rules to guarantee the safety and clarity of service canary: | ||
|
||
1. One request is tagged with one canary at most throughout the full chain (technically header `X-Mesh-Service-Canary` will be only one value, and never change if it's been filled). | ||
2. The tagging rule is defined without any ambiguousness(ordered by priority then name). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.