How do you detect memory issues ?

Discussion:

kyle Hailey

2018-12-06 00:44:35 UTC

One of those questions that seems like it should have been nailed down 20
years ago but it still seems lack a clear answer

How do you detect memory issues ?

I always used "*po" or "paged outs*". Now on Amazon Linux I don't see "po"
but there is "bo" (blocks written out). In past, at least on OSF & Ultrix,
page outs were a sign of needed memory that was written out to disk and
when I needed that memory it would take a big performance hit to read it
in. Thus "po" was a good canary on the coal mine. Any consistent values
over over say 10 were a sign.

Some people use "*scan rate*" but I never found that as easy to interpret
as page outs. Again what values would you use

Some suggest using freeable memory as a yardstick where freeable is "free"
+ "cached" or MemFree + Cached + Inactive. Even in this case what would
you use for values to alert on?

I've always ignored swap stats as if you are swapping it is too late.

What do you use to detect memory issues ?

Kyle

kyle Hailey

2018-12-06 01:16:47 UTC

Permalink

This is interesting:

This is interesting:
https://github.com/torvalds/linux/commit/34e431b0ae398fc54ea69ff85ec700722c9da773

/proc/meminfo: provide estimated available memory

Many load balancing and workload placing programs check /proc/meminfo to
estimate how much free memory is available. They generally do this by
adding up "free" and "cached", which was fine ten years ago, but is
pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as page
cache, for example shared memory segments, tmpfs, and ramfs, and it does
not include reclaimable slab memory, which can take up a large fraction
of system memory on mostly idle systems with lots of files.

Currently, the amount of memory that is available for a new workload,
without pushing the system into swap, can be estimated from MemFree,
Active(file), Inactive(file), and SReclaimable, as well as the "low"
watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should not
be expected to know kernel internals to come up with an estimate for the
amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo. If
things change in the future, we only have to change it in one place.

Post by kyle Hailey
One of those questions that seems like it should have been nailed down 20
years ago but it still seems lack a clear answer
How do you detect memory issues ?
I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
"po" but there is "bo" (blocks written out). In past, at least on OSF &
Ultrix, page outs were a sign of needed memory that was written out to disk
and when I needed that memory it would take a big performance hit to read
it in. Thus "po" was a good canary on the coal mine. Any consistent values
over over say 10 were a sign.
Some people use "*scan rate*" but I never found that as easy to interpret
as page outs. Again what values would you use
Some suggest using freeable memory as a yardstick where freeable is
"free" + "cached" or MemFree + Cached + Inactive. Even in this case what
would you use for values to alert on?
I've always ignored swap stats as if you are swapping it is too late.
What do you use to detect memory issues ?
Kyle

Mladen Gogala

2018-12-06 06:26:50 UTC

Permalink

Hi Kyle,

You are talking about vmstat. I prefer sar. Here is the output of sar -B
3 3:

***@umajor:~$ sar -B 3 3
Linux 4.15.0-42-generic (umajor) Â Â Â 12/06/2018 _x86_64_Â Â Â (8 CPU)

01:10:34 AMÂ pgpgin/s pgpgout/sÂ Â fault/sÂ majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/sÂ Â Â %vmeff
01:10:37 AMÂ 23049.33Â Â 3421.33Â Â Â Â 16.00Â Â Â Â Â 0.00 81.33Â Â Â Â Â 0.00Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
01:10:40 AMÂ 19186.67Â Â Â Â Â 1.33Â Â Â 102.67Â Â Â Â Â 0.00 116.00Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
01:10:43 AMÂ Â Â Â 14.67Â Â 5064.00Â 32142.67Â Â Â Â Â 0.00 25249.00Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
Average:Â Â Â Â 14083.56Â Â 2828.89Â 10753.78Â Â Â Â Â 0.00 8482.11

The important stats are majflts/s, which means that pages had to be read
from disk and pgsteal/s, which denotes the number of the modified pages
backed up and reclaimed as "free". In this context "free" doesn't mean
empty, the page being free means that the page has a valid backup. Page
stealing definitel and paging out (pgpgout/s) definitely means that
there is a memory problem. On Red Hat systems, sar is available in the
sysstat package. Another good indication that something is wrong is
large proportion of kernel mode cpu time, as shown by top.Â Also, "top"
is a good indicator because it shows the swap usage. If the swap usage
keeps growing, there is a trouble with memory.
Regards

Post by kyle Hailey
One of those questions that seems like it should have been nailed down
20 years ago but it still seems lack a clear answer
How do you detect memory issues ?
I always used "*_po" or "paged outs_*". Now on Amazon Linux I don't
see "po" but there is "bo" (blocks written out). InÂ past, at least on
OSF & Ultrix, page outs were a sign of needed memory that was written
out to disk and when I needed that memory it would take a big
performance hit to read it in. Thus "po" was a good canary on the coal
mine. Any consistent values over over say 10 were a sign.
Some people use "*_scan rate_*" but I never found that as easy to
interpret as page outs. Again what values would you use
Some suggest using freeable memory as a yardstick where freeable isÂ
"free" + "cached"Â or MemFreeÂ + CachedÂ + Inactive. Even in this case
what would you use for values to alert on?
I've always ignored swap stats as if you are swapping it is too late.
WhatÂ do you use to detect memory issues ?
Kyle

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

kyle Hailey

2018-12-06 19:36:21 UTC

Permalink

Thanks Mladen
"sar -B" works on Amazon Linux
and it still amazes me how non-obvious monitoring memory pressure is to
this day

Post by Mladen Gogala
Hi Kyle,
You are talking about vmstat. I prefer sar. Here is the output of sar -B 3
Linux 4.15.0-42-generic (umajor) 12/06/2018 _x86_64_ (8 CPU)
01:10:34 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/s %vmeff
01:10:37 AM 23049.33 3421.33 16.00 0.00 81.33
0.00 0.00 0.00 0.00
01:10:40 AM 19186.67 1.33 102.67 0.00 116.00
0.00 0.00 0.00 0.00
01:10:43 AM 14.67 5064.00 32142.67 0.00 25249.00
0.00 0.00 0.00 0.00
Average: 14083.56 2828.89 10753.78 0.00 8482.11
The important stats are majflts/s, which means that pages had to be read
from disk and pgsteal/s, which denotes the number of the modified pages
backed up and reclaimed as "free". In this context "free" doesn't mean
empty, the page being free means that the page has a valid backup. Page
stealing definitel and paging out (pgpgout/s) definitely means that there
is a memory problem. On Red Hat systems, sar is available in the sysstat
package. Another good indication that something is wrong is large
proportion of kernel mode cpu time, as shown by top. Also, "top" is a good
indicator because it shows the swap usage. If the swap usage keeps growing,
there is a trouble with memory.
Regards
One of those questions that seems like it should have been nailed down 20
years ago but it still seems lack a clear answer
How do you detect memory issues ?
I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
"po" but there is "bo" (blocks written out). In past, at least on OSF &
Ultrix, page outs were a sign of needed memory that was written out to disk
and when I needed that memory it would take a big performance hit to read
it in. Thus "po" was a good canary on the coal mine. Any consistent values
over over say 10 were a sign.
Some people use "*scan rate*" but I never found that as easy to interpret
as page outs. Again what values would you use
Some suggest using freeable memory as a yardstick where freeable is
"free" + "cached" or MemFree + Cached + Inactive. Even in this case what
would you use for values to alert on?
I've always ignored swap stats as if you are swapping it is too late.
What do you use to detect memory issues ?
Kyle
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

p***@gmail.com

2018-12-07 01:50:15 UTC

Permalink

2 cents

I have always alerted off swapping using memstats values. I have store vmstat because sar was not always installed or available without request and that was not always possible given the nature of many unix admins. If I did notice paging I donât think it ever resulted in any issues until swapping started to be a threat. Swap is a hard threshold and âsoftâ (one that only alerts if over X for a certain time).

From: oracle-l-***@freelists.org <oracle-l-***@freelists.org> On Behalf Of kyle Hailey
Sent: Thursday, December 6, 2018 1:36 PM
To: Mladen Gogala <***@gmail.com>
Cc: ORACLE-L <oracle-***@freelists.org>
Subject: Re: How do you detect memory issues ?

Thanks Mladen

"sar -B" works on Amazon Linux

and it still amazes me how non-obvious monitoring memory pressure is to this day

On Wed, Dec 5, 2018 at 10:28 PM Mladen Gogala <***@gmail.com <mailto:***@gmail.com> > wrote:

Hi Kyle,

You are talking about vmstat. I prefer sar. Here is the output of sar -B 3 3:

***@umajor:~$ sar -B 3 3
Linux 4.15.0-42-generic (umajor) 12/06/2018 _x86_64_ (8 CPU)

01:10:34 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
01:10:37 AM 23049.33 3421.33 16.00 0.00 81.33 0.00 0.00 0.00 0.00
01:10:40 AM 19186.67 1.33 102.67 0.00 116.00 0.00 0.00 0.00 0.00
01:10:43 AM 14.67 5064.00 32142.67 0.00 25249.00 0.00 0.00 0.00 0.00
Average: 14083.56 2828.89 10753.78 0.00 8482.11

The important stats are majflts/s, which means that pages had to be read from disk and pgsteal/s, which denotes the number of the modified pages backed up and reclaimed as "free". In this context "free" doesn't mean empty, the page being free means that the page has a valid backup. Page stealing definitel and paging out (pgpgout/s) definitely means that there is a memory problem. On Red Hat systems, sar is available in the sysstat package. Another good indication that something is wrong is large proportion of kernel mode cpu time, as shown by top. Also, "top" is a good indicator because it shows the swap usage. If the swap usage keeps growing, there is a trouble with memory.

Regards

On 12/5/18 7:44 PM, kyle Hailey wrote:

One of those questions that seems like it should have been nailed down 20 years ago but it still seems lack a clear answer

How do you detect memory issues ?

I always used "po" or "paged outs". Now on Amazon Linux I don't see "po" but there is "bo" (blocks written out). In past, at least on OSF & Ultrix, page outs were a sign of needed memory that was written out to disk and when I needed that memory it would take a big performance hit to read it in. Thus "po" was a good canary on the coal mine. Any consistent values over over say 10 were a sign.

Some people use "scan rate" but I never found that as easy to interpret as page outs. Again what values would you use

Some suggest using freeable memory as a yardstick where freeable is "free" + "cached" or MemFree + Cached + Inactive. Even in this case what would you use for values to alert on?

I've always ignored swap stats as if you are swapping it is too late.

What do you use to detect memory issues ?

Kyle

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Mladen Gogala

2018-12-07 03:14:56 UTC

Permalink

Another very decent monitoring tool is Nigel's monitor, also known as
"nmon". Those who have worked on AIX are probably well acquainted with
that tool. It also works on both Red Hat and Ubuntu. Here is a sample:

This utility probably works on Amazon Linux, too.

Regards

Post by kyle Hailey
Thanks Mladen
"sar -B" works on Amazon Linux
and it still amazes me how non-obvious monitoring memory pressure is
to this day
Hi Kyle,
You are talking about vmstat. I prefer sar. Here is the output of
Linux 4.15.0-42-generic (umajor) Â Â Â 12/06/2018 Â Â Â _x86_64_Â Â Â (8
CPU)
01:10:34 AMÂ pgpgin/s pgpgout/sÂ Â fault/s majflt/sÂ pgfree/s
pgscank/s pgscand/s pgsteal/s %vmeff
01:10:37 AMÂ 23049.33Â Â 3421.33Â Â Â Â 16.00 0.00Â Â Â Â 81.33Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
01:10:40 AMÂ 19186.67Â Â Â Â Â 1.33Â Â Â 102.67 0.00Â Â Â 116.00Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
01:10:43 AMÂ Â Â Â 14.67Â Â 5064.00Â 32142.67 0.00Â 25249.00Â Â Â Â Â
0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00Â Â Â Â Â 0.00
Average:Â Â Â Â 14083.56Â Â 2828.89Â 10753.78 0.00Â Â 8482.11
The important stats are majflts/s, which means that pages had to
be read from disk and pgsteal/s, which denotes the number of the
modified pages backed up and reclaimed as "free". In this context
"free" doesn't mean empty, the page being free means that the page
has a valid backup. Page stealing definitel and paging out
(pgpgout/s) definitely means that there is a memory problem. On
Red Hat systems, sar is available in the sysstat package. Another
good indication that something is wrong is large proportion of
kernel mode cpu time, as shown by top. Also, "top" is a good
indicator because it shows the swap usage. If the swap usage keeps
growing, there is a trouble with memory.
Regards

Post by kyle Hailey
One of those questions that seems like it should have been nailed
down 20 years ago but it still seems lack a clear answer
How do you detect memory issues ?
I always used "*_po" or "paged outs_*". Now on Amazon Linux I
don't see "po" but there is "bo" (blocks written out). InÂ past,
at least on OSF & Ultrix, page outs were a sign of needed memory
that was written out to disk and when I needed that memory it
would take a big performance hit to read it in. Thus "po" was a
good canary on the coal mine. Any consistent values over over say
10 were a sign.
Some people use "*_scan rate_*" but I never found that as easy to
interpret as page outs. Again what values would you use
Some suggest using freeable memory as a yardstick where freeable
isÂ "free" + "cached"Â or MemFreeÂ + CachedÂ + Inactive. Even in
this case what would you use for values to alert on?
I've always ignored swap stats as if you are swapping it is too late.
WhatÂ do you use to detect memory issues ?
Kyle

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Jared Still

2018-12-07 16:02:41 UTC

Permalink

nmon is indeed a great tool.

it is much better when used in conjunction with the analyzer

http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/index.html

Post by Mladen Gogala
Another very decent monitoring tool is Nigel's monitor, also known as
"nmon". Those who have worked on AIX are probably well acquainted with that
This utility probably works on Amazon Linux, too.
Regards
Thanks Mladen
"sar -B" works on Amazon Linux
and it still amazes me how non-obvious monitoring memory pressure is to
this day

Post by Mladen Gogala
Hi Kyle,
You are talking about vmstat. I prefer sar. Here is the output of sar -B
Linux 4.15.0-42-generic (umajor) 12/06/2018 _x86_64_ (8 CPU)
01:10:34 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/s %vmeff
01:10:37 AM 23049.33 3421.33 16.00 0.00 81.33
0.00 0.00 0.00 0.00
01:10:40 AM 19186.67 1.33 102.67 0.00 116.00
0.00 0.00 0.00 0.00
01:10:43 AM 14.67 5064.00 32142.67 0.00 25249.00
0.00 0.00 0.00 0.00
Average: 14083.56 2828.89 10753.78 0.00 8482.11
The important stats are majflts/s, which means that pages had to be read
from disk and pgsteal/s, which denotes the number of the modified pages
backed up and reclaimed as "free". In this context "free" doesn't mean
empty, the page being free means that the page has a valid backup. Page
stealing definitel and paging out (pgpgout/s) definitely means that there
is a memory problem. On Red Hat systems, sar is available in the sysstat
package. Another good indication that something is wrong is large
proportion of kernel mode cpu time, as shown by top. Also, "top" is a good
indicator because it shows the swap usage. If the swap usage keeps growing,
there is a trouble with memory.
Regards
One of those questions that seems like it should have been nailed down 20
years ago but it still seems lack a clear answer
How do you detect memory issues ?
I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
"po" but there is "bo" (blocks written out). In past, at least on OSF &
Ultrix, page outs were a sign of needed memory that was written out to disk
and when I needed that memory it would take a big performance hit to read
it in. Thus "po" was a good canary on the coal mine. Any consistent values
over over say 10 were a sign.
Some people use "*scan rate*" but I never found that as easy to
interpret as page outs. Again what values would you use
Some suggest using freeable memory as a yardstick where freeable is
"free" + "cached" or MemFree + Cached + Inactive. Even in this case what
would you use for values to alert on?
I've always ignored swap stats as if you are swapping it is too late.
What do you use to detect memory issues ?
Kyle
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
--

Mladen Gogala
Database Consultant
Tel: (347) 321-1217
--

Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist
Principal Consultant at Pythian
Pythian Blog http://www.pythian.com/blog/author/still/
Github: https://github.com/jkstill

Mladen Gogala

2018-12-08 06:53:13 UTC

Permalink

Yeah, you're right. The analyzer was also ported to Linux:

http://nmon.sourceforge.net/pmwiki.php

Post by Jared Still
nmon is indeed a great tool.
it is much better when used in conjunction with the analyzer
http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/index.html

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

--
http://www.freelists.org/webpage/oracle-l

Continue reading on narkive:

Search results for 'How do you detect memory issues ?' (Questions and Answers)

replies

Bad Memory!! need help?

started 2009-08-18 10:55:46 UTC

hardware

replies

I got a speeding ticket and there are some questionable issues...?

started 2010-05-18 13:26:14 UTC

law enforcement & police

replies