Troubleshoot and Solve Domain-Join Problems

Review the sections in this chapter to resolve domain-join problems.

Top 10 Reasons Domain-Join Fail

Here are the top 10 reasons that an attempt to join a domain fails:

  1. Root was not used to run the domain-join command (or to run the domain-join graphical user interface).
  2. The user name or password of the account used to join the domain is incorrect.
  3. The name of the domain is mistyped.
  4. The name of the OU is mistyped.
  5. The local hostname is invalid.
  6. The domain controller is unreachable from the client because of a firewall or because the NTP service is not running on the domain controller.
For more information, see the following:
  1. The client is running RHEL 2.1 and has an old version of SSH.
  2. On SUSE, GDM (dbus) must be restarted. This daemon cannot be automatically restarted if the user logged on with the graphical user interface.
  3. On Solaris, dtlogin must be restarted. This daemon cannot be automatically restarted if the user logged on with the Solaris graphical user interface. To restart dtlogin, run the following command:
    /sbin/init.d/dtlogin.rc start
  4. SELinux is set to either enforcing or permissive, likely on Fedora. SELinux must be set to disabled before the computer can be joined to the domain.

To turn off SELinux, see the SELinux man page.

Solve Domain-Join Problems

To troubleshoot problems with joining a Linux computer to a domain, perform the following series of diagnostic tests sequentially on the Linux computer with a root account.

The tests can also be used to troubleshoot domain-join problems on a Unix computer; however, the syntax of the commands on Unix might be slightly different.

The procedures in this topic assume that you have already checked whether the problem falls under the Top 10 Reasons Domain Join Fails (see above). We also recommend that you generate a domain-join log.

For more information, see Generate a Domain-Join Log for AD Bridge.

Verify that the Name Server Can Find the Domain

Run the following command as root:

nslookup YourADrootDomain.com

Make Sure the Client Can Reach the Domain Controller

You can verify that your computer can reach the domain controller by pinging it:

ping YourDomainName

Check DNS Connectivity

The computer might be using the wrong DNS server or none at all. Make sure the nameserver entry in /etc/resolv.conf contains the IP address of a DNS server that can resolve the name of the domain you are trying to join. The IP address is likely to be that of one of your domain controllers.

Make Sure nsswitch.conf Is Configured to Check DNS for Host Names

The /etc/nsswitch.conf file must contain the following line. (On AIX, the file is /etc/netsvc.conf.)

hosts: files dns

Computers running Solaris, in particular, may not contain this line in nsswitch.conf until you add it.

Ensure that DNS Queries Use the Correct Network Interface Card

If the computer is multi-homed, the DNS queries might be going out the wrong network interface card.

Temporarily disable all the NICs except for the card on the same subnet as your domain controller or DNS server and then test DNS lookups to the AD domain.

If this works, re-enable all the NICs and edit the local or network routing tables so that the AD domain controllers are accessible from the host.

Determine If DNS Server Is Configured to Return SRV Records

Your DNS server must be set to return SRV records so the domain controller can be located. It is common for non-Windows (bind) DNS servers to not be configured to return SRV records.

Diagnose it by executing the following command:

nslookup -q=srv _ldap._tcp.  ADdomainToJoin.com

Make Sure that the Global Catalog Is Accessible

The global catalog for Active Directory must be accessible. A global catalog in a different zone might not show up in DNS. Diagnose it by executing the following command:

nslookup -q=srv _ldap._tcp.gc._msdcs. ADrootDomain.com

From the list of IP addresses in the results, choose one or more addresses and test whether they are accessible on Port 3268 using telnet.

telnet 192.168.100.20 3268
Trying 192.168.100.20... Connected to sales-dc.example.com (192.168.100.20). Escape character is '^]'. Press the Enter key to close the connection: Connection closed by foreign host.

Verify that the Client Can Connect to the Domain on Port 123

The following test checks whether the client can connect to the domain controller on Port 123 and whether the Network Time Protocol (NTP) service is running on the domain controller. For the client to join the domain, NTP, the Windows time service, must be running on the domain controller.

On a Linux computer, run the following command as root:

ntpdate -d -u DC_hostname
ntpdate -d -u sales-dc

For more information, see Diagnose NTP on Port 123

In addition, check the logs on the domain controller for errors from the source named w32tm, which is the Windows time service.

FreeBSD: Run ldconfig If You Cannot Restart Computer

When installing AD Bridge on a new FreeBSD computer with nothing in /usr/local, run /etc/rc.d/ldconfig start after the installation if you cannot restart the computer. Otherwise, /usr/local/lib will not be in the library search path.

Ignore Inaccessible Trusts

An inaccessible trust can block you from successfully joining a domain. If you know that there are inaccessible trusts in your Active Directory network, you can set AD Bridge to ignore all the trusts before you try to join a domain. To do so, use the config tool to modify the values of the DomainManagerIgnoreAllTrusts setting.

  1. List the available trust settings:
/opt/pbis/bin/config --list | grep -i trust

The results will look something like this. The setting at issue is DomainManagerIgnoreAllTrusts

DomainManagerIgnoreAllTrusts
DomainManagerIncludeTrustsList
DomainManagerExcludeTrustsList
.

  1. List the details of the DomainManagerIgnoreAllTrusts setting to see the values it accepts:
[root@rhel5d bin]# ./config --details DomainManagerIgnoreAllTrusts
Name: DomainManagerIgnoreAllTrusts
Description: When true, ignore all trusts during domain enumeration.
Type: boolean
Current Value: false
Accepted Values: true, false
Current Value is determined by local policy.
  1. Change the setting to true so that AD Bridge will ignore trusts when you try to join a domain.
[root@rhel5d bin]# ./config DomainManagerIgnoreAllTrusts true
  1. Check to make sure the change took effect:
[root@rhel5d bin]# ./config --show DomainManagerIgnoreAllTrusts
boolean
true
local policy

Now try to join the domain again. If successful, keep in mind that only users and groups who are in the local domain will be able to log on the computer.

In the example output above that shows the setting's current values, local policy is listed, meaning that the setting is managed locally through config because an AD Bridge Group Policy setting is not managing the setting. Typically, with AD Bridge, you would manage the DomainManagerIgnoreAllTrusts setting by using the corresponding Group Policy setting, but you cannot apply Group Policy Objects (GPOs) to the computer until after it is added to the domain. The corresponding AD Bridge policy setting is named Lsass: Ignore all trusts during domain enumeration.

For information on the arguments of config, run the following command:

/opt/pbis/bin/config --help

Resolve Common Error Messages

This section lists solutions to common errors that can occur when you try to join a domain.

Configuration of krb5

Error Message:

Warning: A resumable error occurred while processing a module.
Even though the configuration of 'krb5' was executed, the configuration did not
fully complete. Please contact BeyondTrust support.

Solution:

Delete /etc/krb5.conf and try to join the domain again.

Chkconfig Failed

This error can occur when you try to join a domain or you try to execute the domain-join command with an option but the netlogond daemon is not already running.

Error Message:

Error: chkconfig failed [code 0x00080019]

Description: An error occurred while using chkconfig to process the netlogond daemon, which must be added to the list of processes to start when the computer is rebooted. The problem may be caused by startup scripts in the /etc/rc.d/ tree that are not LSB-compliant.

Verification: Running the following command as root can provide information about the error:

chkconfig --add netlogond

Solution:

Remove startup scripts that are not LSB-compliant from the /etc/rc.d/ tree.

Replication Issues

The following error might occur if there are replication delays in your environment. A replication delay might occur when the client is in the same site as an RODC.

Error Message:

Error: LW_ERROR_KRB5KDC_ERR_C_PRINCIPAL_UNKNOWN [code 0x0000a309]
Client not found in Kerberos database
[root@rhel6-1 ~]# echo $?
1
[root@rhel6-1 ~]# /opt/pbis/bin/domainjoin-cli query
Error: LW_ERROR_KRB5KDC_ERR_C_PRINCIPAL_UNKNOWN [code 0x0000a309]
Client not found in Kerberos database

Solution:

After the error occurs, wait 15 minutes, and then run the following command to restart AD Bridge:

/opt/pbis/bin/lwsm restart lwreg

Diagnose NTP on Port 123

When you use the AD Bridgedomain-join utility to join a Linux or Unix client to a domain, the utility might be unable to contact the domain controller on Port 123 with UDP. The AD Bridge agent requires that Port 123 be open on the client so that it can receive NTP data from the domain controller. In addition, the time service must be running on the domain controller.

You can diagnose NTP connectivity by executing the following command as root at the shell prompt of your Linux computer:

ntpdate -d -u   DC_hostname
ntpdate -d -u sales-dc

If all is well, the result should look like this:

[root@rhel44id ~]# ntpdate -d -u sales-dc
2 May 14:19:20 ntpdate[20232]: ntpdate 4.2.0a@1.1190-r Thu Apr 20 11:28:37 EDT 2006 (1)
Looking for host sales-dc and service ntp
host found : sales-dc.example.com
transmit(192.168.100.20)
receive(192.168.100.20)
transmit(192.168.100.20)
receive(192.168.100.20)
transmit(192.168.100.20)
receive(192.168.100.20)
transmit(192.168.100.20)
receive(192.168.100.20)
transmit(192.168.100.20)
server 192.168.100.20, port 123
stratum 1, precision -6, leap 00, trust 000
refid [LOCL], delay 0.04173, dispersion 0.00182
transmitted 4, in filter 4
reference time:    cbc5d3b8.b7439581  Fri, May  2 2008 10:54:00.715
originate timestamp: cbc603d8.df333333  Fri, May  2 2008 14:19:20.871
transmit timestamp:  cbc603d8.dda43782  Fri, May  2 2008 14:19:20.865
filter delay:  0.04207  0.04173  0.04335  0.04178
 0.00000  0.00000  0.00000  0.00000
filter offset: 0.009522 0.008734 0.007347 0.005818
 0.000000 0.000000 0.000000 0.000000
delay 0.04173, dispersion 0.00182
offset 0.008734
2 May 14:19:20 ntpdate[20232]: adjust time server 192.168.100.20 offset 0.008734 sec

Output When There is No NTP Service

If the domain controller is not running NTP on Port 123, the command returns a response such as no server suitable for synchronization found, as in the following output:

5 May 16:00:41 ntpdate[8557]: ntpdate 4.2.0a@1.1190-r Thu Apr 20 11:28:37 EDT 2006 (1)
Looking for host RHEL44ID and service ntp
host found : rhel44id.example.com
transmit(127.0.0.1)
transmit(127.0.0.1)
transmit(127.0.0.1)
transmit(127.0.0.1)
transmit(127.0.0.1)
127.0.0.1: Server dropped: no data
server 127.0.0.1, port 123
stratum 0, precision 0, leap 00, trust 000
refid [127.0.0.1], delay 0.00000, dispersion 64.00000
transmitted 4, in filter 4
reference time:    00000000.00000000  Wed, Feb  6 2036 22:28:16.000
originate timestamp: 00000000.00000000  Wed, Feb  6 2036 22:28:16.000
transmit timestamp:  cbca101c.914a2b9d  Mon, May  5 2008 16:00:44.567
filter delay:  0.00000  0.00000  0.00000  0.00000
 0.00000  0.00000  0.00000  0.00000
filter offset: 0.000000 0.000000 0.000000 0.000000
 0.000000 0.000000 0.000000 0.000000
delay 0.00000, dispersion 64.00000
offset 0.000000
5 May 16:00:45 ntpdate[8557]: no server suitable for synchronization found

Turn off Apache to Join a Domain

The Apache web server locks the keytab file, which can block an attempt to join a domain. If the computer is running Apache, stop Apache, join the domain, and then restart Apache.