I'm getting 1450 errors in the Vault log, indicated Windows kernel exhaustion. How can I use poolmon to determine what is using up all the kernel space?
> ERROR 15115: begin read operation delay, code [1450]
> ERROR 15115: begin read operation delay, code [1450]
A 1450 error is caused when the kernel has insufficient resources to complete commands.
Under heavy load Vault will use a lot of system resources and is sensitive to errors like this.
This can be made significantly worse by bad drivers or things such as antivirus programs (even when inactive).
In theory, something like poolmon.exe can be used to diagnose what memory pool is causing resource exhaustion.
Using a 64-version of Windows server can reduce 1450 problems because they have much higher kernel resource limits.
This error is quite dangerous as it can cause random I/O failures system wide which can lead to Vault index corruption.
> ERROR 17832: attempt to insert index field will cause overflow, record size [14], adding [101], index [index\ncnd\invlink.dri]
Make sure you're running the current 5.3 patch level since there are index fixes for some older builds.
The kernel has a page and non-page memory pool it uses to operate. These pools have limits which can be tight on 32-bit versions of Windows.
Rapidly opening and closing files as Vault does, puts pressure on these resource pools.
If something like a bad driver or antivirus program also takes up a lot of pool memory, you could hit the limit and operations would start failing with 1450 errors.
The operation delays can make Vault unresponsive and it ultimately it cannot recover it can cause corruption.
In general, the best way to avoid this is to use a 64-bit version of Windows.
A 1450 error under Windows normally indicates that *kernel* resources are exhausted trying to execute the request. It usually isn't referring to available virtual memory or disk space. Vault will try to back off (the "operation delay" part) in the hope that whatever is causing the system resource problem clears up over time. It won't wait forever and in this case it isn't clearing in time.
Things that can cause or increase the likelihood of 1450 errors include:
(1) bad drivers
(2) virus scanners (sometimes even when real time scanning is disabled)
(3) running Vault over redirected storage
(4) large amounts of physical memory (the page tables for the memory take up kernel memory)
(5) loading massive numbers of small files in Vault
(6) file fragmentation
Using a 64-bit version of Windows server alleviates most of the 1450 issues. Are they running a 64-bit version?
(1) Did any 1450 errors cause corruption?
What you should do is search the process logs for any ERROR 15116 messages that did NOT say "code [0]".
e.g.
ERROR 15116: end read operation delay, code [1450]
If the end code is 0, the operation was retried and succeeded after the delay.
If it still has a non-zero error code if could not recover and that could easily cause corruption.
(2) The insertion errors.
This may be an older version that still has the insertion overflow bug but has insertion guard code that detects it.
This will cause keys or ranges of keys to become inaccessible as a result of the insertion failure.
Is some cases this could be fixed with just a .index on recent jobs.
But if the failure occurred on split (which I think is the case), older data may also be affected.
If that's true, reindexing may be the only way to make sure all keys are properly restored.
This is pretty important to get fixed since it could cause a reindex to fail or corruption after reindexing.
I know Mike Wood recommends 64-bit windows (e.g. Windows Server 2008 R2 64-bit) for 1450 issues.
Poolmon may give you some clues. You'll probably need to enable pool tagging with gflags in the older Windows OSes. Then run poolmon and hit 'b' to sort by bytes and see the top consumers. The code on the left is the tag and you can determine what they are in pooltags.txt or by search drivers (*.sys) in system32\drivers.
If you can determine a time when the problem started you might be able to link it to some driver or software change on the machine.
A tool like poolmon.exe can display tagged memory use in the kernel and can sometimes help point out what in particular is taking up the kernel space.
(reference material)
How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks
http://support.microsoft.com/kb/177415
How to find pool tags that are used by third-party drivers
http://support.microsoft.com/kb/298102
Assuming poolmon is already installed and enabled on the Vault server, you can use these steps to collect the relevant information about what programs are using up the kernel memory and send it to us.
Click Start, point to Settings, click Control Panel, and then double-click Console.
Type Poolmon.exe.
Press P until Poolmon displays the second column "type" and shows the value paged.
Press B to sort the columns from largest to smallest.
Select the whole screen contents, and then press ENTER.
Click Start, point to Programs, point to Accessories, and then click Notepad.
On the Edit menu, click Paste.
Repeat step 7 to look for the value nonpaged.
Repeat steps 8 - 11 to paste.
Some example values:
Some example values:
Memory: 3667864K Avail: 1439584K PageFlts: 24158 InRam Krnl: 2932K P:99172K
Commit:2785620K Limit:6640824K Peak:3313080K Pool N:31668K P:113116K
Tag Type Allocs Frees Diff Bytes Per Alloc
MmSt Paged 848745 ( 25) 840794 ( 38) 7951 23009248 (-17184) 2893
NV_x Paged 710 ( 0) 694 ( 0) 16 15134720 ( 0) 945920
Wmit Paged 1646 ( 0) 834 ( 0) 812 13615760 ( 0) 16768
Gh05 Paged 7347983 ( 92) 7347150 ( 92) 833 8887304 ( 0) 10669
Toke Paged 7588637 ( 77) 7587760 ( 78) 877 5078920 ( -592) 5791
CM35 Paged 320 ( 0) 231 ( 0) 89 4927488 ( 0) 55365
UlHT Paged 1 ( 0) 0 ( 0) 1 4198400 ( 0) 4198400
NtfF Paged 91318 (1150) 88206 (1127) 3112 2937728 ( 21712) 944
CM25 Paged 2631 ( 0) 2127 ( 0) 504 2826240 ( 0) 5607
MFE* Paged 21353 ( 0) 21336 ( 0) 17 2748752 ( 0) 161691
Ntff Paged 176683 ( 1) 173504 ( 27) 3179 2644928 (-21632) 832
NV Paged 5438003 ( 60) 5437351 ( 60) 652 1870008 ( 0) 2868
Ttfd Paged 352810 ( 0) 351976 ( 0) 834 1770488 ( 0) 2122
Gla1 Paged 34049 ( 8) 32959 ( 10) 1090 1744000 ( -3200) 1600
CMVa Paged 3809612 ( 2) 3796113 ( 10) 13499 879872 ( -360) 65
CMDa Paged 755440 ( 2) 748736 ( 5) 6704 864952 ( -296) 129
Gla: Paged 65892 ( 0) 64655 ( 0) 1237 811472 ( 0) 656
IoNm Paged 15894123 (12346) 15889098 (12351) 5025 799448 ( -576) 159
Gcac Paged 33621 ( 0) 33501 ( 0) 120 792136 ( 0) 6601
Obtb Paged 19924 ( 4) 19628 ( 6) 296 730496 ( -4176) 2467
FSim Paged 101910 ( 1) 96640 ( 1) 5270 674560 ( 0) 128
Gla5 Paged 128413 ( 2) 126908 ( 3) 1505 589960 ( -392) 392
NVMI Paged 377 ( 0) 365 ( 0) 12 525760 ( 0) 43813
Ntfo Paged 146390 (1377) 143291 (1354) 3099 465768 ( 2440) 150
NtFf Paged 3296 ( 4) 3287 ( 4) 9 393432 ( 0) 43714
NtFs Paged 397716 (1115) 390710 (1114) 7006 386328 ( -232) 55
CM16 Paged 81 ( 0) 5 ( 0) 76 356352 ( 0) 4688
Key Paged 62972298 (1736) 62968928 (1749) 3370 350440 ( -1352) 103
MmSm Paged 115667 ( 2) 110423 ( 4) 5244 335616 ( -128) 64
FSrm Paged 16396 ( 19) 16177 ( 20) 219 318936 ( 6040) 1456
Ggb Paged 17640 ( 0) 17538 ( 0) 102 315568 ( 0) 3093
CMAl Paged 1191 ( 0) 1118 ( 0) 73 299008 ( 0) 4096
NtFB Paged 98594 ( 2) 98579 ( 2) 15 279320 ( 0) 18621
CM29 Paged 77 ( 0) 43 ( 0) 34 278528 ( 0) 8192
Gla4 Paged 1098150 ( 0) 1096780 ( 0) 1370 241120 ( 0) 176
Ntf0 Paged 571482 (1390) 565267 (1389) 6215 222456 ( 24) 35
SYSA Paged 43376 ( 0) 42044 ( 0) 1332 202768 ( 0) 152
Ntfc Paged 105697 ( 12) 102886 ( 15) 2811 202392 ( -216) 72
Port Paged 108061 ( 8) 106988 ( 14) 1073 198712 ( -1104) 185
Memory: 3667864K Avail: 1462060K PageFlts: 1084 InRam Krnl: 2932K P:99172K
Commit:2775772K Limit:6640824K Peak:3313080K Pool N:31660K P:113116K
Tag Type Allocs Frees Diff Bytes Per Alloc
MFEm Nonp 693223 ( 0) 693190 ( 0) 33 5245248 ( 0) 158946
Devi Nonp 1679 ( 0) 1204 ( 0) 475 2055848 ( 0) 4328
MFE0 Nonp 1331601190 (7075) 1331588457 (7075) 12733 1962256 ( 0) 154
MmCm Nonp 546 ( 0) 498 ( 0) 48 1839648 ( 0) 38326
MfFi Nonp 47491 ( 0) 47284 ( 0) 207 1634224 ( 0) 7894
File Nonp 25722544 ( 295) 25712377 ( 295) 10167 1620904 ( 0) 159
Attv Nonp 5897788 ( 94) 5897293 ( 94) 495 1081040 ( 0) 2183
Thre Nonp 64568128 (1377) 64566767 (1375) 1361 871040 ( 1280) 640
MFEK Nonp 456763 ( 1) 456745 ( 1) 18 755832 ( 0) 41990
FiHk Nonp 1216 ( 0) 816 ( 0) 400 731152 ( 0) 1827
MmCa Nonp 2307534 ( 10) 2300744 ( 10) 6790 686752 ( 0) 101
Ntfr Nonp 198087 ( 2) 187886 ( 0) 10201 653320 ( 128) 64
NV Nonp 149373 ( 0) 148272 ( 0) 1101 642304 ( 0) 583
CcSc Nonp 1175813 ( 3) 1174051 ( 3) 1762 549744 ( 0) 312
Even Nonp 59465242 (1000) 59453932 ( 997) 11310 547072 ( 144) 48
Vad Nonp 4560916 ( 19) 4553835 ( 19) 7081 339888 ( 0) 48
Mm Nonp 13 ( 0) 0 ( 0) 13 337120 ( 0) 25932
usbp Nonp 6725752 ( 137) 6725638 ( 137) 114 317408 ( 0) 2784
NDpp Nonp 283 ( 0) 171 ( 0) 112 290840 ( 0) 2596
NtFs Nonp 378561 ( 2) 371857 ( 0) 6704 269984 ( 80) 40
Ntfn Nonp 268489 ( 1) 261788 ( 0) 6701 269600 ( 40) 40
Irp Nonp 23001431 ( 231) 23000700 ( 233) 731 262240 ( -568) 358
FSfm Nonp 104296 ( 0) 98439 ( 0) 5857 234280 ( 0) 40
Mdl Nonp 1190636 ( 2) 1188938 ( 3) 1698 217416 ( -128) 128
VadS Nonp 19115433 ( 288) 19108673 ( 286) 6760 216320 ( 64) 32
PgFO Nonp 66332 ( 0) 64794 ( 0) 1538 196864 ( 0) 128
AmlH Nonp 3 ( 0) 0 ( 0) 3 196608 ( 0) 65536
Ntf0 Nonp 3 ( 0) 0 ( 0) 3 196608 ( 0) 65536
Pool Nonp 5 ( 0) 2 ( 0) 3 184320 ( 0) 61440
MmCi Nonp 4173 ( 0) 3364 ( 0) 809 183528 ( 0) 226
AfdC Nonp 240846 ( 0) 239770 ( 0) 1076 172160 ( 0) 160
Vadl Nonp 11803841 ( 461) 11801562 ( 459) 2279 145856 ( 128) 64
Audx Nonp 2964 ( 0) 2954 ( 0) 10 122880 ( 0) 12288
TCPC Nonp 49173 ( 0) 47822 ( 0) 1351 117032 ( 0) 86
CMpa Nonp 121690 ( 0) 119684 ( 0) 2006 112336 ( 0) 56
MFEs Nonp 168670 ( 0) 168064 ( 0) 606 106656 ( 0) 176
Sema Nonp 644574 ( 3) 642688 ( 3) 1886 106512 ( 0) 56
MFEb Nonp 873504 ( 0) 870348 ( 0) 3156 105712 ( 0) 33
WmiG Nonp 20555 ( 0) 20073 ( 0) 482 104112 ( 0) 216
UPDATED:
September 18, 2017