Chris Stephens
2018-07-30 14:44:15 UTC
Some time ago I noticed a slow, steady increase in dropped packets on
private network interface on each node of our 3-node 12.2 RAC database
running on CentOS 7. ~25-30 dropped packets / minute.
I originally suspected a hardware issue. We've had issues with this
particular switch model in the past but our network engineer has not been
able to find anything that would indicate switch as source of issue. I
don't think its an issue with network cards or cables because it's
happening on all 3 nodes.
We looked into some of the buffers involved in receiving packets by card
and passing them up through to user space. I wasn't hopeful since dropped
packet counts continue to increase when system is idle. while i learned a
lot about linux networking, we weren't able to resolve the issue.
"ethtool -S" shows dropped packets which [i think] indicates problems at
the lowest levels of the stack but I'm out of ideas on where to go from
here.
When I shut down the only database in the cluster, dropped packets still
increase.
When I stop the whole cluster, dropped packets still increase. However, I
did notice orarootagent.bin and oraagent.bin are still running after
"crsctl stop cluster -all" which I haven't noticed in the past.
There were some EM13c agents running but after stopping those, dropped
packets still increase.
Does anyone have any suggestions that might lead to figuring out what
packets are being dropped?
Thanks, as always, for any insights.
Chris
private network interface on each node of our 3-node 12.2 RAC database
running on CentOS 7. ~25-30 dropped packets / minute.
I originally suspected a hardware issue. We've had issues with this
particular switch model in the past but our network engineer has not been
able to find anything that would indicate switch as source of issue. I
don't think its an issue with network cards or cables because it's
happening on all 3 nodes.
We looked into some of the buffers involved in receiving packets by card
and passing them up through to user space. I wasn't hopeful since dropped
packet counts continue to increase when system is idle. while i learned a
lot about linux networking, we weren't able to resolve the issue.
"ethtool -S" shows dropped packets which [i think] indicates problems at
the lowest levels of the stack but I'm out of ideas on where to go from
here.
When I shut down the only database in the cluster, dropped packets still
increase.
When I stop the whole cluster, dropped packets still increase. However, I
did notice orarootagent.bin and oraagent.bin are still running after
"crsctl stop cluster -all" which I haven't noticed in the past.
There were some EM13c agents running but after stopping those, dropped
packets still increase.
Does anyone have any suggestions that might lead to figuring out what
packets are being dropped?
Thanks, as always, for any insights.
Chris