Skip to content

x/build/cmd/rundockerbuildlets: scaleway buildlets fail looping #22748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bradfitz opened this issue Nov 15, 2017 · 14 comments
Closed

x/build/cmd/rundockerbuildlets: scaleway buildlets fail looping #22748

bradfitz opened this issue Nov 15, 2017 · 14 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

The scaleway builders are all saying:

root@scw-e8738f:~# journalctl -f -u rundockerbuildlet.service
...
Nov 15 20:41:28 scw-e8738f rundockerbuildlet[4707]: 2017/11/15 20:41:28 Creating scaleway-prod-05 ...
Nov 15 20:41:28 scw-e8738f rundockerbuildlet[4707]: 2017/11/15 20:41:28 Error creating scaleway-prod-05: exit status 125, docker: Error response from daemon: Conflict. The name "/scaleway-prod-05" is already in use by container e1a1f4e89f3a073402f2ae513753db661c41a4727e808215758364c313850357. You have to remove (or rename) that container to be able to reuse that name..
Nov 15 20:41:28 scw-e8738f rundockerbuildlet[4707]: See 'docker run --help'.
Nov 15 20:41:29 scw-e8738f rundockerbuildlet[4707]: 2017/11/15 20:41:29 Creating scaleway-prod-05 ...
Nov 15 20:41:29 scw-e8738f rundockerbuildlet[4707]: 2017/11/15 20:41:29 Error creating scaleway-prod-05: exit status 125, docker: Error response from daemon: Conflict. The name "/scaleway-prod-05" is already in use by container e1a1f4e89f3a073402f2ae513753db661c41a4727e808215758364c313850357. You have to remove (or rename) that container to be able to reuse that name..
Nov 15 20:41:29 scw-e8738f rundockerbuildlet[4707]: See 'docker run --help'.

/cc @jessfraz @adams-sarah @andybons

@gopherbot gopherbot added this to the Unreleased milestone Nov 15, 2017
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Nov 15, 2017
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/78032 mentions this issue: dashboard: remove linux-arm as a trybot

gopherbot pushed a commit to golang/build that referenced this issue Nov 15, 2017
Fixes golang/go#22592
Updates golang/go#22749
Updates golang/go#22748

Change-Id: I6b58f21ececa212a12af8ef21cdef9c90515b837
Reviewed-on: https://go-review.googlesource.com/78032
Reviewed-by: Andrew Bonventre <[email protected]>
@bradfitz
Copy link
Contributor Author

I nuked all 50 instances and recreated and now I see them just deleting & restarting the containers:

Nov 16 00:08:02 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:02 Removed container 755cec905f30 (scaleway-prod-50)
Nov 16 00:08:02 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:02 Creating scaleway-prod-50 ...
Nov 16 00:08:03 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:03 Created scaleway-prod-50
Nov 16 00:08:17 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:17 Removed container 2c1569cd1186 (scaleway-prod-50)
Nov 16 00:08:17 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:17 Creating scaleway-prod-50 ...
Nov 16 00:08:19 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:19 Created scaleway-prod-50
Nov 16 00:08:32 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:32 Created scaleway-prod-50
Nov 16 00:08:47 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:47 Removed container b69851e6724a (scaleway-prod-50)
Nov 16 00:08:47 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:47 Creating scaleway-prod-50 ...
Nov 16 00:08:49 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:08:49 Created scaleway-prod-50
Nov 16 00:09:00 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:00 Removed container a278c75f4c14 (scaleway-prod-50)
Nov 16 00:09:00 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:00 Creating scaleway-prod-50 ...
Nov 16 00:09:02 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:02 Created scaleway-prod-50
Nov 16 00:09:07 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:07 Removed container 650d128f0243 (scaleway-prod-50)
Nov 16 00:09:07 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:07 Creating scaleway-prod-50 ...
Nov 16 00:09:08 scw-e8738f rundockerbuildlet[4815]: 2017/11/16 00:09:08 Created scaleway-prod-50

Why? Maybe because of #22749 is making the containers start up too slowly?

@bradfitz
Copy link
Contributor Author

Hm... archive/tar changes?

The docker logs are saying:

2017/11/16 01:05:03 Downloaded ./buildlet.exe (7178728 bytes)
2017/11/16 01:05:03 buildlet starting.
2017/11/16 01:05:04 Dialing coordinator farmer.golang.org:443 ...
2017/11/16 01:05:04 Doing TLS handshake with coordinator (verifying hostname "farmer.golang.org")...
2017/11/16 01:05:04 Registering reverse mode with coordinator...
2017/11/16 01:05:04 Connected to coordinator; reverse dialing active
2017/11/16 01:05:05 Removing .
2017/11/16 01:05:05 Ignoring fail of RemoveAll(.)
2017/11/16 01:05:06 writetgz: untarring Request.Body into /workdir/go
2017/11/16 01:05:06 extracted tarball into /workdir/go: 1 files, 1 dirs (1.338425ms)
2017/11/16 01:05:06 writetgz: untarring Request.Body into /workdir/go
2017/11/16 01:05:06 error extracting tarball into /workdir/go after 0 files, 0 dirs, 105.528154ms: tar file contained invalid name ""
2017/11/16 01:05:07 Halting in 1 second.
2017/11/16 01:05:07 buildlet reverse mode exiting.

Note the tar file contained invalid name "".

I probably built the linux-arm binary with Go tip when I addressed #21839, and now the new tip-built buildlet can't parse the coordinator's Go 1.8 tar files.

/cc @dsnet

@dsnet
Copy link
Member

dsnet commented Nov 16, 2017

What's the code that generates the tar file (or reads it)? The "0 files, 0 dirs" is also interesting.

@bradfitz
Copy link
Contributor Author

The The "0 files, 0 dirs" just means it's failing on the first entry and the counters for number of files and number of dirs haven't been incremented yet.

The tar.gz comes from x/build/cmd/coordinator which gets it either from gitmirror (which just does exec.CommandContext(ctx, "git", "archive", "--format=tgz", rev)) or it gets it directly from Gerrit's equivalent export-a-tarball handler.

There's only one use of tar.NewWriter that we're using in the coordinator (to generate a VERSION file), but that push works fine. You can see it in the log above (the first one, extracted tarball into /workdir/go: 1 files, 1 dirs (1.338425ms))

@bradfitz
Copy link
Contributor Author

Btw, I re-pushed a new buildlet binary built at Go 1.9 and they're all working again. So it does seem to be something at tip.

@dsnet
Copy link
Member

dsnet commented Nov 16, 2017

Is there any way to get access to the tar.gz file being passed in?

@bradfitz
Copy link
Contributor Author

With some work I suppose.

@dsnet
Copy link
Member

dsnet commented Nov 16, 2017

I have a theory.

I added support for parsing the PAX records for an esoteric feature of tar, which are global PAX records, denoted by a typeflag of TypeXGlobalHeader. The relevant new code may return a header without a filename (since they don't make sense for global headers).

Assuming this is the issue, you probably want to either push the validRelPath check to be within each case of the switch statement or check the Header.Typeflag up front .

@dsnet
Copy link
Member

dsnet commented Nov 16, 2017

Confirmed. I ran git archive --format=tgz ee6321b5405504f1d090d01a4703ec9ff6b218ea | gzip -d | head -c 512| hexdump -C and got:

00000000  70 61 78 5f 67 6c 6f 62  61 6c 5f 68 65 61 64 65  |pax_global_heade|
00000010  72 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |r...............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 30 30 30 30  36 36 36 00 30 30 30 30  |....0000666.0000|
00000070  30 30 30 00 30 30 30 30  30 30 30 00 30 30 30 30  |000.0000000.0000|
00000080  30 30 30 30 30 36 34 00  31 33 32 30 33 31 32 35  |0000064.13203125|
00000090  36 36 31 00 30 30 31 34  35 31 31 00 67 00 00 00  |661.0014511.g...|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

You can see a typeflag of 'g' at offset 0x9c.
I'm rather surprised that git outputs archives using this esoteric feature.

@bradfitz
Copy link
Contributor Author

Do you think the Go tar reader should expose these records by default, or do you think they should be opt-in?

@dsnet
Copy link
Member

dsnet commented Nov 16, 2017

They were actually always returned in Go1.9 and earlier, but returned a bogus file containing the raw contents of the PAX headers.

The buildlet logic actually happened to work by chance. There aren't any hard restrictions on what filename to use for global headers, so the one used by git just happened to work for validRelPath. So if you looked onto the filesystem of the builders, you would have found a bogus file called "pax_global_header".

I was actually surprised that the buildet worked at all in Go1.9 and earlier. The surprising behavior is that tar.Header{Typeflag: tar.TypeXGlobalHeader}.FileInfo().Mode().IsRegular() reports true, when I would have expected false. Had this been false, then it would hit the default case in the logic, making the problem obvious.

Unfortunately, I don't know how to fix tar.Header.FileInfo.Mode to report false here. The problem is that os.FileMode.IsRegular has no way to indicate that something is not a regular file without saying that it is something else since it only checks no mode bits are set.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/78355 mentions this issue: archive/tar: use placeholder name for global PAX records

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/78546 mentions this issue: cmd/buildlet: ignore pax global header when untarring

gopherbot pushed a commit that referenced this issue Nov 29, 2017
Several usages of tar (reasonably) just use the Header.FileInfo
to determine the type of the header. However, the os.FileMode type
is not expressive enough to represent "files" that are not files
at all, but some form of metadata.

Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular()
reports true, even though the expected result may have been false.

To reduce (not eliminate) the possibility of failure for such usages,
use the placeholder filename from the global PAX headers.
Thus, in the event the user did not handle special "meta" headers
specifically, they will just be written to disk as a regular file.

As an example use case, the "git archive --format=tgz" command produces
an archive where the first "file" is a global PAX header with the
name "global_pax_header". For users that do not explicitly check
the Header.Typeflag field to ignore such headers, they may end up
extracting a file named "global_pax_header". While it is a bogus file,
it at least does not stop the extraction process.

Updates #22748

Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9
Reviewed-on: https://go-review.googlesource.com/78355
Reviewed-by: Brad Fitzpatrick <[email protected]>
Run-TryBot: Brad Fitzpatrick <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@golang golang locked and limited conversation to collaborators Nov 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

3 participants