0

I have a bunch of servers all running Ubuntu 20.04 and all configured the same, we use Chef to configure things.

System details:

  • OS: Ubuntu 20.04LTS with the latest patches
  • Hardware: Dell R730s, R740s, R750s, all with 256Gbytes memory
  • Storage: All have 8 internal disks with the root disk a RAID1 10 of the R740s have 60 12Tbyte 12Gbs SAS external drives in MD1400 enclosures multi-pathed.
  • Network: 10Ge setup with a failover bond.

This problem started happening after our last patch cycle. After about a week ssh logins become very slow, greater than 30 seconds.

For ssh logins we use government CAC access cards. SSHD is setup to use organization's LDAP server to retrieve the required credentials

Looking at auth.log I see this:

May 23 16:14:23 localhost sshd[2513183]: Connection from 192.168.13.3 port 41504 on 192.168.13.31 port 22 rdomain ""
May 23 16:14:26 localhost ldapsearch: DIGEST-MD5 common mech free
May 23 16:14:26 localhost sshd[2513183]: Accepted key RSA SHA256:p1nqEMbnyqddp2nEMCdIE4RCfY4ivW2wvKh8GSt0Uyk found at /usr/local
/bin/openldap-wrapper.sh:1
May 23 16:14:26 localhost sshd[2513183]: Postponed publickey for ccjohns1 from 192.168.13.3 port 41504 ssh2 [preauth]
May 23 16:14:26 localhost ldapsearch: DIGEST-MD5 common mech free
May 23 16:14:26 localhost sshd[2513183]: Accepted key RSA SHA256:p1nqEMbnyqddp2nEMCdIE4RCfY4ivW2wvKh8GSt0Uyk found at /usr/local/bin/openldap-wrapper.sh:1
May 23 16:14:26 localhost sshd[2513183]: Accepted publickey for ccjohns1 from 192.168.13.3 port 41504 ssh2: RSA SHA256:p1nqEMbnyqddp2nEMCdIE4RCfY4ivW2wvKh8GSt0Uyk
May 23 16:14:26 localhost sshd[2513183]: pam_unix(sshd:session): session opened for user ccjohns1 by (uid=0)
May 23 16:14:51 localhost dbus-daemon[2053]: [system] Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
May 23 16:14:51 localhost sshd[2513183]: pam_systemd(sshd:session): Failed to create session: Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
May 23 16:14:51 localhost sshd[2513183]: User child is on pid 2513244
May 23 16:14:51 localhost sshd[2513244]: Starting session: shell on pts/1 for ccjohns1 from 192.168.13.3 port 41504 id 0

Looking at the log connection and authentication take ~1 second. You can see that PAM starts the session but freedesktop.login1 times out after 25s.

I have done massive Google searches and it seems to point to restarting systemd-logind.

Have tried that and that fails.
Seems that only a reboot clears the problem but that is only temporary.

These are production servers and rebooting is not really a solution.
Does anybody have any ideas.

2
  • @zx495, thank you for fixing the formatting, kinda of a newbie on this forum.
    – ccjohnson
    Commented May 28 at 18:28
  • Just to let everyone know I have solved the problem. I turns out it was snapd not cleaning up its temporary mount points and corresponding directories. On the "problem" servers each 7000+ mount points. Removing snapd and getting rid of the mount points and directories returned things to what we normally see.
    – ccjohnson
    Commented May 28 at 18:31

0

You must log in to answer this question.

Browse other questions tagged .