Skip to content

Commit c49cb1e

Browse files
Merge pull request #1329 from abhinavdahiya/docs
docs/troubleshooting: add sections to cover all post-create phases
2 parents 2b65c9d + 709c241 commit c49cb1e

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed

docs/user/troubleshooting.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,237 @@ This is safe to ignore and merely indicates that the etcd bootstrapping is still
9999

100100
The easiest way to get more debugging information from the installer is to check the log file (`.openshift_install.log`) in the install directory. Regardless of the logging level specified, the installer will write its logs in case they need to be inspected retroactively.
101101

102+
### Installer Fails to Initialize the Cluster
103+
104+
The installer uses the [cluster-version-operator] to create all the components of an OpenShift cluster. When the installer fails to initialize the cluster, the most important information can be fetched by looking at the [ClusterVersion][clusterversion] and [ClusterOperator][clusteroperator] objects:
105+
106+
1. Inspecting the `ClusterVersion` object.
107+
108+
```console
109+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusterversion -oyaml
110+
apiVersion: config.openshift.io/v1
111+
kind: ClusterVersion
112+
metadata:
113+
creationTimestamp: 2019-02-27T22:24:21Z
114+
generation: 1
115+
name: version
116+
resourceVersion: "19927"
117+
selfLink: /apis/config.openshift.io/v1/clusterversions/version
118+
uid: 6e0f4cf8-3ade-11e9-9034-0a923b47ded4
119+
spec:
120+
channel: stable-4.0
121+
clusterID: 5ec312f9-f729-429d-a454-61d4906896ca
122+
upstream: https://api.openshift.com/api/upgrades_info/v1/graph
123+
status:
124+
availableUpdates: null
125+
conditions:
126+
- lastTransitionTime: 2019-02-27T22:50:30Z
127+
message: Done applying 4.0.0-0.alpha-2019-02-27-131049
128+
status: "True"
129+
type: Available
130+
- lastTransitionTime: 2019-02-27T22:50:30Z
131+
status: "False"
132+
type: Failing
133+
- lastTransitionTime: 2019-02-27T22:50:30Z
134+
message: Cluster version is 4.0.0-0.alpha-2019-02-27-131049
135+
status: "False"
136+
type: Progressing
137+
- lastTransitionTime: 2019-02-27T22:24:31Z
138+
message: 'Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-02-27-131049'
139+
reason: RemoteFailed
140+
status: "False"
141+
type: RetrievedUpdates
142+
desired:
143+
image: registry.svc.ci.openshift.org/openshift/origin-release@sha256:91e6f754975963e7db1a9958075eb609ad226968623939d262d1cf45e9dbc39a
144+
version: 4.0.0-0.alpha-2019-02-27-131049
145+
history:
146+
- completionTime: 2019-02-27T22:50:30Z
147+
image: registry.svc.ci.openshift.org/openshift/origin-release@sha256:91e6f754975963e7db1a9958075eb609ad226968623939d262d1cf45e9dbc39a
148+
startedTime: 2019-02-27T22:24:31Z
149+
state: Completed
150+
version: 4.0.0-0.alpha-2019-02-27-131049
151+
observedGeneration: 1
152+
versionHash: Wa7as_ik1qE=
153+
```
154+
155+
Some of most important [conditions][cluster-operator-conditions] to take note are `Failing`, `Available` and `Progressing`. You can look at the conditions using:
156+
157+
```console
158+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusterversion version -o=jsonpath='{range .status.conditions[*]}{.type}{" "}{.status}{" "}{.message}{"\n"}{end}'
159+
Available True Done applying 4.0.0-0.alpha-2019-02-26-194020
160+
Failing False
161+
Progressing False Cluster version is 4.0.0-0.alpha-2019-02-26-194020
162+
RetrievedUpdates False Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-02-26-194020
163+
```
164+
165+
2. Inspecting the `ClusterOperator` object.
166+
167+
You can get the status of all the cluster operators:
168+
169+
```console
170+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator
171+
NAME VERSION AVAILABLE PROGRESSING FAILING SINCE
172+
cluster-autoscaler True False False 17m
173+
cluster-storage-operator True False False 10m
174+
console True False False 7m21s
175+
dns True False False 31m
176+
image-registry True False False 9m58s
177+
ingress True False False 10m
178+
kube-apiserver True False False 28m
179+
kube-controller-manager True False False 21m
180+
kube-scheduler True False False 25m
181+
machine-api True False False 17m
182+
machine-config True False False 17m
183+
marketplace-operator True False False 10m
184+
monitoring True False False 8m23s
185+
network True False False 13m
186+
node-tuning True False False 11m
187+
openshift-apiserver True False False 15m
188+
openshift-authentication True False False 20m
189+
openshift-cloud-credential-operator True False False 18m
190+
openshift-controller-manager True False False 10m
191+
openshift-samples True False False 8m42s
192+
operator-lifecycle-manager True False False 17m
193+
service-ca True False False 30m
194+
```
195+
196+
To get detailed information on why an individual cluster operator is `Failing` or not yet `Available`, you can check the status of that individual operator, for example `monitoring`:
197+
198+
```console
199+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator monitoring -oyaml
200+
apiVersion: config.openshift.io/v1
201+
kind: ClusterOperator
202+
metadata:
203+
creationTimestamp: 2019-02-27T22:47:04Z
204+
generation: 1
205+
name: monitoring
206+
resourceVersion: "24677"
207+
selfLink: /apis/config.openshift.io/v1/clusteroperators/monitoring
208+
uid: 9a6a5ef9-3ae1-11e9-bad4-0a97b6ba9358
209+
spec: {}
210+
status:
211+
conditions:
212+
- lastTransitionTime: 2019-02-27T22:49:10Z
213+
message: Successfully rolled out the stack.
214+
status: "True"
215+
type: Available
216+
- lastTransitionTime: 2019-02-27T22:49:10Z
217+
status: "False"
218+
type: Progressing
219+
- lastTransitionTime: 2019-02-27T22:49:10Z
220+
status: "False"
221+
type: Failing
222+
extension: null
223+
relatedObjects: null
224+
version: ""
225+
```
226+
227+
Again, the cluster operators also publish [conditions][cluster-operator-conditions] like `Failing`, `Available` and `Progressing` that can help user provide information on the current state of the operator:
228+
229+
```console
230+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator monitoring -o=jsonpath='{range .status.conditions[*]}{.type}{" "}{.status}{" "}{.message}{"\n"}{end}'
231+
Available True Successfully rolled out the stack
232+
Progressing False
233+
Failing False
234+
```
235+
236+
Each clusteroperator also publishes the list of objects owned by the cluster operator. To get that information:
237+
238+
```console
239+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator kube-apiserver -o=jsonpath='{.status.relatedObjects}'
240+
[map[resource:kubeapiservers group:operator.openshift.io name:cluster] map[group: name:openshift-config resource:namespaces] map[group: name:openshift-config-managed resource:namespaces] map[group: name:openshift-kube-apiserver-operator resource:namespaces] map[group: name:openshift-kube-apiserver resource:namespaces]]
241+
```
242+
243+
**NOTE:** Failing to initialize the cluster is usually not a fatal failure in terms of cluster creation as the user can look at the failures from `ClusterOperator` to debug failures for a cluster operator and take actions which can allow `cluster-version-operator` to make progress.
244+
245+
### Installer Fails to Fetch Console URL
246+
247+
The installer fetches the URL for OpenShift console using the [route][route-object] in `openshift-console` namespace. If the installer fails the fetch the URL for the console:
248+
249+
1. Check if the console router is `Available` or `Failing`
250+
251+
```console
252+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator console -oyaml
253+
apiVersion: config.openshift.io/v1
254+
kind: ClusterOperator
255+
metadata:
256+
creationTimestamp: 2019-02-27T22:46:57Z
257+
generation: 1
258+
name: console
259+
resourceVersion: "19682"
260+
selfLink: /apis/config.openshift.io/v1/clusteroperators/console
261+
uid: 960364aa-3ae1-11e9-bad4-0a97b6ba9358
262+
spec: {}
263+
status:
264+
conditions:
265+
- lastTransitionTime: 2019-02-27T22:46:58Z
266+
status: "False"
267+
type: Failing
268+
- lastTransitionTime: 2019-02-27T22:50:12Z
269+
status: "False"
270+
type: Progressing
271+
- lastTransitionTime: 2019-02-27T22:50:12Z
272+
status: "True"
273+
type: Available
274+
- lastTransitionTime: 2019-02-27T22:46:57Z
275+
status: "True"
276+
type: Upgradeable
277+
extension: null
278+
relatedObjects:
279+
- group: operator.openshift.io
280+
name: cluster
281+
resource: consoles
282+
- group: config.openshift.io
283+
name: cluster
284+
resource: consoles
285+
- group: oauth.openshift.io
286+
name: console
287+
resource: oauthclients
288+
- group: ""
289+
name: openshift-console-operator
290+
resource: namespaces
291+
- group: ""
292+
name: openshift-console
293+
resource: namespaces
294+
versions: null
295+
```
296+
297+
2. Manually get the URL for `console`
298+
299+
```console
300+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get route console -n openshift-console -o=jsonpath='{.spec.host}'
301+
console-openshift-console.apps.adahiya-1.devcluster.openshift.com
302+
```
303+
304+
### Installer Fails to Add Route CA to Kubeconfig
305+
306+
The installer adds the CA certificate for the router to the list of trusted client certificate authorities in `${INSTALL_DIR}/auth/kubeconfig`. If the installer fails to add the router CA to `kubeconfig`, you can fetch the router CA from the cluster using:
307+
308+
```console
309+
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get configmaps router-ca -n openshift-config-managed -o=jsonpath='{.data.ca-bundle\.crt}'
310+
-----BEGIN CERTIFICATE-----
311+
MIIC/TCCAeWgAwIBAgIBATANBgkqhkiG9w0BAQsFADAuMSwwKgYDVQQDDCNjbHVz
312+
dGVyLWluZ3Jlc3Mtb3BlcmF0b3JAMTU1MTMwNzU4OTAeFw0xOTAyMjcyMjQ2Mjha
313+
Fw0yMTAyMjYyMjQ2MjlaMC4xLDAqBgNVBAMMI2NsdXN0ZXItaW5ncmVzcy1vcGVy
314+
YXRvckAxNTUxMzA3NTg5MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
315+
uCA4fQ+2YXoXSUL4h/mcvJfrgpBfKBW5hfB8NcgXeCYiQPnCKblH1sEQnI3VC5Pk
316+
2OfNCF3PUlfm4i8CHC95a7nCkRjmJNg1gVrWCvS/ohLgnO0BvszSiRLxIpuo3C4S
317+
EVqqvxValHcbdAXWgZLQoYZXV7RMz8yZjl5CfhDaaItyBFj3GtIJkXgUwp/5sUfI
318+
LDXW8MM6AXfuG+kweLdLCMm3g8WLLfLBLvVBKB+4IhIH7ll0buOz04RKhnYN+Ebw
319+
tcvFi55vwuUCWMnGhWHGEQ8sWm/wLnNlOwsUz7S1/sW8nj87GFHzgkaVM9EOnoNI
320+
gKhMBK9ItNzjrP6dgiKBCQIDAQABoyYwJDAOBgNVHQ8BAf8EBAMCAqQwEgYDVR0T
321+
AQH/BAgwBgEB/wIBADANBgkqhkiG9w0BAQsFAAOCAQEAq+vi0sFKudaZ9aUQMMha
322+
CeWx9CZvZBblnAWT/61UdpZKpFi4eJ2d33lGcfKwHOi2NP/iSKQBebfG0iNLVVPz
323+
vwLbSG1i9R9GLdAbnHpPT9UG6fLaDIoKpnKiBfGENfxeiq5vTln2bAgivxrVlyiq
324+
+MdDXFAWb6V4u2xh6RChI7akNsS3oU9PZ9YOs5e8vJp2YAEphht05X0swA+X8V8T
325+
C278FFifpo0h3Q0Dbv8Rfn4UpBEtN4KkLeS+JeT+0o2XOsFZp7Uhr9yFIodRsnNo
326+
H/Uwmab28ocNrGNiEVaVH6eTTQeeZuOdoQzUbClElpVmkrNGY0M42K0PvOQ/e7+y
327+
AQ==
328+
-----END CERTIFICATE-----
329+
```
330+
331+
You can then **prepend** that certificate to `client-certificate-authority-data` field in your `${INSTALL_DIR}/auth/kubeconfig`.
332+
102333
## Generic Troubleshooting
103334

104335
Here are some ideas if none of the [common failures](#common-failures) match your symptoms.
@@ -263,3 +494,7 @@ If appropriate, file a [network operator](https://github.com/openshift/cluster-n
263494
[aws-key-pairs]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
264495
[kubernetes-debug]: https://kubernetes.io/docs/tasks/debug-application-cluster/
265496
[machine-config-daemon-ssh-keys]: https://github.com/openshift/machine-config-operator/blob/master/docs/Update-SSHKeys.md
497+
[cluster-version-operator]: https://github.com/openshift/cluster-version-operator/blob/master/README.md
498+
[clusterversion]: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusterversion.md
499+
[clusteroperator]: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md
500+
[cluster-operator-conditions]: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions

0 commit comments

Comments
 (0)