Skip to content

Commit e0725eb

Browse files
fixup: address review comments
1 parent f7e5226 commit e0725eb

File tree

1 file changed

+76
-8
lines changed

1 file changed

+76
-8
lines changed

documentation/crash/crash_setup.md

Lines changed: 76 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
## Node.js application crash diagnostics: Best Practices series #1
2+
# Node.js application crash diagnostics: Best Practices series #1
33

44
This is the first of a series of best practices and useful tips if you
55
are using Node.js in large scale production systems.
@@ -23,7 +23,31 @@ need to be manually `prepared` in advance to enable crash dianostic
2323
data generation on the first failure itself, without loosing vital data.
2424
The rest of the document illustrates this preparation steps.
2525

26-
## Available disk space
26+
The key artifacts for exploring Node.js application crashes are:
27+
- core dump (a.k.a system dump, core file)
28+
- diagnostic report (originally known as node report)
29+
30+
Reference: [Diagnostic Report](https://nodejs.org/dist/latest-v12.x/docs/api/report.html)
31+
32+
## Common issues
33+
34+
While the said key artifacts are expected to be generated on abnormal
35+
program conditions such as crash, (diagnostic report is still
36+
experimental so requires explicit command line flags to switch it ON)
37+
there are a number of issues that affects the automatic and complete
38+
generation of these artifacts. Most common such issues are:
39+
- Insufficient disk space for writing core dump data
40+
- Insufficient privilege to the core dump generator function
41+
- Insufficient resource limits set on the user
42+
- In case of diagnostic report, absence of report and symtpom flag
43+
44+
## Recommended Best Practice
45+
46+
This section provides specific recommendations for
47+
how to configure your systems in advance in order to be
48+
ready to investigate crashes.
49+
50+
### Available disk space
2751
Ensure that there is enough disk space available for the core file
2852
to be written:
2953

@@ -44,31 +68,38 @@ $ top -p 106916
4468
106916 user 20 0 600404 54500 15572 R 109.7 0.0 81098:54 node
4569
```
4670

47-
In Darwin, the flag is `-pid`
48-
In AIX, the command is `topas`
71+
In Darwin, the flag is `-pid`.
72+
73+
In AIX, the command is `topas`.
74+
4975
In freebsd, the command is `top`. In both AIX and freebsd, there is no
50-
flag to show per-process details. In Windows, you could use the task
76+
flag to show per-process details.
77+
78+
In Windows, you could use the task
5179
manager window and view the process attributes visually.
5280

5381
Insufficient file system space will result in truncated core files,
5482
and can severely hamper the ability to diagnose the problem.
5583

5684
Figure out how much free space is available in the file system:
85+
5786
`df -k` can be used invariably across UNIX platforms.
87+
5888
In Windows, Windows explorer when pointed to a disk partition,
5989
provides a view of the available space in that partition.
6090

61-
## Core file location and name
62-
6391
By default, core file is generated on a crash event, and is
6492
written to the current working directory - the location from
6593
where the node process was started, in most of the UNIX variants.
94+
6695
In Darwin, it appears in /cores location.
6796

6897
By default, core files from node processes on Linux are named as
6998
`core` or `core.<pid>`, where <pid> is node process id.
99+
70100
By default, core files from node processes on AIX and Darwin are
71101
named ‘core’.
102+
72103
By default, core files from node processes on freebsd are named
73104
‘%N.core’. where `%N` is the name of the crashed process.
74105

@@ -80,7 +111,20 @@ Modify pattern using `sysctl -w kernel.core_pattern=pattern` as root.
80111

81112
In AIX, `lscore` shows the current core file pattern.
82113

114+
A best practice is to remove old core files on regular intervals.
115+
116+
This makes sure that the space in the system is used efficiently,
117+
and no application spefific data is persisted inadvertently.
118+
119+
A best practice is to name core file with the name, process ID and
120+
the creation timestamp of the failed process.
121+
122+
This makes it easy to relate the binary dump with crash specific context.
123+
124+
### Configuring to ensure core generation
125+
83126
Enable full core dump generation using `chdev -l sys0 -a fullcore=true`
127+
84128
Modify the current pattern using `chcore -p on -n on -l /path/to/coredumps`
85129

86130
In Darwin and freebsd, `sysctl kern.corefile` shows the corrent core file pattern.
@@ -90,7 +134,9 @@ Modify the current pattern using `sysctl -w kern.corefile=newpattern` as root.
90134
To obtain full core files, set the following ulimit options, across UNIX variants:
91135

92136
`ulimit -c unlimited` - turn on core file generation capability with unlimited size
137+
93138
`ulimit -d unlimited` - set the user data limit to unlimited
139+
94140
`ulimit -f unlimited` - set the file limit to unlimited
95141

96142
The current ulimit settings can be displayed using:
@@ -104,15 +150,37 @@ system administrator. System administrators (with superuser privileges)
104150
may display, set or change the hard limits by adding the -H flag to
105151
the standard set of ulimit commands.
106152

107-
## Manual dump generation
153+
For example with:
154+
`ulimit -c -H`
155+
156+
104857600
157+
158+
we cannot increase the core file size to 200 MB. So
159+
160+
`ulimit -c 209715200`
161+
162+
will fail with reason:
163+
164+
`ulimit: core size: cannot modify limit: Invalid argument`
165+
166+
So if you hard limit settings are constraining your application's
167+
requirement, relax those specific settings through administrator
168+
account.
169+
170+
## Additional information
171+
172+
### Manual dump generation
108173

109174
Under certain circumstances where you want to collect a core
110175
manually follow these steps:
111176

112177
In linux, use `gcore [-a] [-o filename] pid` where `-a`
113178
specifies to dump everything.
179+
114180
In AIX, use `gencore [pid] [filename]`
181+
115182
In freebsd and Darwin, use `gcore [-s] [executable] pid`
183+
116184
In Windows, you can use `Task manager` window, right click on the
117185
node process and select `create dump` option.
118186

0 commit comments

Comments
 (0)