-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Generate better workspace ids #2947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
71bdd3a
to
8ceff55
Compare
/werft run 👍 started the job as gitpod-build-se-workspace-id.2 |
8ceff55
to
4120858
Compare
/werft run 👍 started the job as gitpod-build-se-workspace-id.4 |
25a7485
to
a17624c
Compare
First of: I love the idea of using friendlier workspace IDs. That said, a few remarks:
|
This PR already updates most of them, i.e. the dev staging previews work. But please help to spot places I missed.
So you are proposing to review and copy the dictionaries into our repo to have more control?
That's why this PR adds 8 characters from the uuid4. Maybe we should include generate our own sequence here, that even includes all lower case alpabetical letters. |
Places that come to mind are:
The proxy places seem generous already:
That would certainly solve the problem. It would also allow us to generate those kinds of IDs not just from TypeScript. E.g. loadgen or the integration tests also need to generate workspace IDs. We could provide a port of this library in common-go and use the new Go 1.16
Sorry, I did miss that. That solves the issue nicely and reduces the collision likelihood considerably: |
/werft run 👍 started the job as gitpod-build-se-workspace-id.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this happening! 🐼
while (uuid.charAt(0).match("[0-9]") != null) // No numbers as first char, as we use this id as DNS name | ||
return uuid | ||
const randomName: string = uniqueNamesGenerator({ | ||
dictionaries: [adjectives, colors, animals], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I think using only colors and animals in conjunction with random hash in the end could suffice. This can make items more memorable for the user. 💭
CURRENT | PROPOSED |
---|---|
ambitious-blue-hamster-155799b3 | blue-hamster-155799b3 |
question: Even in the rare case of having the same pair being generated the hash can still be different, right?
Plus, given the ephemeral nature of environments that should not be a problem. In the end, we just need to introduce enough entropy to avoid as much as possible having the same color-animal pair being used.
idea: We could use later on these generated names for creating shorter URLs if we want, which could be useful or introduce friendlier URLs to share environments, snapshots, and more. See also #2905.
Environment URL (Example) (BEFORE) | Environment URL (Example) (AFTER) |
---|---|
b9b5e28a-9198-4f4d-8e85-07c864d88026.ws-eu03.gitpod.io |
blue-hamster-155799b3.gitpod.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I think using only colors and animals in conjunction with random hash in the end could suffice. This can make items more memorable for the user. 💭
Yes, let's do that.
Plus, given the ephemeral nature of environments that should not be a problem. In the end, we just need to introduce enough entropy to avoid as much as possible having the same color-animal pair being used.
Environments are not technically ephemeral and we don't want to test whether a generated ID is unique, because this would be expensive. But I think we can add sufficient characters to the suffix. So the first two names are for humans and the rest is to make them technically unique.
I'm also now implementing a name generator in typescript and go, which uses the same ver simple pattern: |
9499b68
to
77ce9c7
Compare
/werft run 👍 started the job as gitpod-build-se-workspace-id.14 |
71ea69b
to
3167c26
Compare
/werft run 👍 started the job as gitpod-build-se-workspace-id.21 |
/werft run 👍 started the job as gitpod-build-se-workspace-id.22 |
3167c26
to
cd71874
Compare
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few more questions/thoughts:
- Say we brought this into production, what's the effect on existing workspaces? Would they still work/start considering that they'd have old UUID and the patterns changed?
- Considering the entity type changes, I expected a DB migration.
|
||
func TestGenerateWorkspaceID(t *testing.T) { | ||
|
||
t.Run(fmt.Sprintf("check names are valid"), func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the t.Run
. The outer t
is already a valid test. The t.Run
would indicate a sub test,
// Licensed under the GNU Affero General Public License (AGPL). | ||
// See License-AGPL.txt in the project root for license information. | ||
|
||
package util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util
a bit too generic as package name (see https://blog.golang.org/package-names).
How about:
uidgen
wsid
wsidgen
namegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like namegen
best
"time" | ||
) | ||
|
||
// UnmarshalJSON parses the duration to a time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// UnmarshalJSON parses the duration to a time.Duration | |
// GenerateWorkspaceID generates a new workspace ID by randomly choosing | |
// a color, an animal and a number of characters. |
import { Transformer } from "../transformer"; | ||
|
||
@Entity() | ||
export class DBLayoutData implements LayoutData { | ||
|
||
@PrimaryColumn(TypeORM.UUID_COLUMN_TYPE) | ||
@PrimaryColumn("varchar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original type hat a fixed length. Considering how important the workspace ID is, I think it would make sense to introduce a specific WORKSPACE_ID_TYPE
type, akin to UUID_COLUMN_TYPE
.
Otherwise we'll end up with new tables that use VARCHAR(255)
for workspace IDs in the future.
Also, considering that workspace IDs have roughly the same char length, we would probably see better performance if we used CHAR
columns (like we have in the past). See e.g. https://dba.stackexchange.com/questions/424/performance-implications-of-mysql-varchar-sizes/1915#1915
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should keep the type (char 36). I'll just define a new column type TypeORM.WORKSPACE_ID_COLUMN_TYPE
with the same technical type mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think migrations are needed then.
|
||
var characters = strings.Split("abcdefghijklmnopqrstuvwxyz0123456789", "") | ||
|
||
var colors = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we're maintaining this list in two places, I wonder if would make sense to keep this as txt file and use something like go generate
/embed
/go.rice in Go, load the text file on startup in TS.
This way we'd prevent those lists from drifting apart over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect changes in those lists. So maybe this is something we should invest in once we start changing those lists (or drift becomes a burden).
dfa4c7e
to
92ea403
Compare
Sorry for being late here, but:
|
We discussed this in a video call, but just for transparency:
There was a missunderstanding that the new ID would use or even comply with uuids. They do not.
We talked this through and decided that having two IDs (technical and human readable) wouldn't be very helpful, because the requirements of uniqueness would apply to both equally. Furthermore, all existing logic around the form of the ID would need to be updated because it sits around the URLs which would obviously contain the human-readable ID. |
/werft run 👍 started the job as gitpod-build-se-workspace-id.28 |
92ea403
to
16d46cf
Compare
/werft run 👍 started the job as gitpod-build-se-workspace-id.30 |
@csweichel Could you help check the likelihood of a collision in the situation of same seed used in multiple instances? |
@@ -18,6 +18,10 @@ export class TypeORM { | |||
type: 'char', | |||
length: 36 | |||
}; | |||
static readonly WORKSPACE_ID_COLUMN_TYPE: ColumnOptions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* @param hostname The hostname the request is headed to | ||
*/ | ||
export const parseWorkspaceIdFromHostname = function(hostname: string) { | ||
// We need to parse the workspace id precisely here to get the case '<some-str>-<port>-<wsid>.ws.' right | ||
const wsIdExpression = /([a-z][0-9a-z]+\-([0-9a-z]+\-){3}[0-9a-z]+)\.ws/g; | ||
const wsIdExpression = /([a-z]{3,12}-[a-z]{2,16}-[A-Za-z0-9]{8})\.ws/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression does not support both cases (UUID and new one). Wanted?
package namegen | ||
|
||
import ( | ||
"math/rand" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuid.NewRandom
(the one we used before) uses crypto/rand
. I suggest we used the same to make sure we have the same level of "randomness per bit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impl seems straight forward:
var rander = rand.Reader // random function
func NewRandom() (UUID, error) {
return NewRandomFromReader(rander)
}
// NewRandomFromReader returns a UUID based on bytes read from a given io.Reader.
func NewRandomFromReader(r io.Reader) (UUID, error) {
var uuid UUID
_, err := io.ReadFull(r, uuid[:])
if err != nil {
return Nil, err
}
uuid[6] = (uuid[6] & 0x0f) | 0x40 // Version 4
uuid[8] = (uuid[8] & 0x3f) | 0x80 // Variant is 10
return uuid, nil
}
(from here)
|
* See License-AGPL.txt in the project root for license information. | ||
*/ | ||
|
||
export function generateWorkspaceID() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for TypeScript version: uuid
uses randomFillSync.
16d46cf
to
1ac8fbe
Compare
Generates workspace names such as "pink-panda-15x7r9b3".
I.e.
[color]-[animal]-[0-9a-z]{8}