- Kerberos provides secure authentication, ensuring that only legitimate users can access Hadoop resources.
- LDAP provides a directory service that maps users to their permissions, ensuring that access control is centralized and managed efficiently.
Together, Kerberos and LDAP protect sensitive data in HDFS by ensuring that only the right people have access to the right files at the right time.
Let’s break down a scenario where a Hadoop client accesses a file in a Hadoop Distributed File System (HDFS), using Kerberos for authentication and LDAP for directory-based user management.
You’re working for a company "META" that uses Hadoop for managing its large-scale data. The company has set up Kerberos for security, ensuring only authenticated users can access sensitive data. LDAP is used to manage employee information and permissions, like which employees can access which files.
- You, as a user, log into your Hadoop client (like a laptop or a desktop) using your username and password.
- When you log in, Kerberos checks your credentials against its central KDC (Key Distribution Center), which is like a trusted "security checkpoint."
- If your username and password match what's stored in KDC, Kerberos gives you a Ticket Granting Ticket (TGT).
- This TGT acts as proof that you are a verified user and contains information about your identity.
Now, as a client, you want to access a file stored in HDFS, for example, a large data file located at /user/data/file.txt.
-
Kerberos Authentication Check:
- Before allowing access to any file, HDFS needs to check whether the client is authenticated.
- The HDFS NameNode first checks if the TGT that you obtained from Kerberos is valid and active (not expired).
- If the TGT is valid, Kerberos gives HDFS a Service Ticket, which proves that the client is authorized.
-
LDAP Directory Check for Permissions:
- Now that the client’s identity is authenticated via Kerberos, the HDFS NameNode queries LDAP to check your permissions.
- LDAP is like a directory service that stores who has access to what data in the Hadoop ecosystem. It also tracks roles and permissions for users.
- For example, LDAP checks if your user (e.g.,
JohnDoe) belongs to a certain group (e.g., "DataScience Team") that has read access to the file/user/data/file.txt. - If you have permissions, LDAP confirms that you can proceed.
- Once Kerberos and LDAP both authorize the client, the client can access the file stored in HDFS.
- The HDFS DataNodes store chunks of the file, and the NameNode keeps track of where the data is located.
- The client communicates with the DataNodes directly to read the data, as the file is split into blocks and distributed across multiple nodes in HDFS.
- Kerberos ensures that only authorized users with valid credentials can even begin the process of accessing Hadoop resources, thus preventing unauthorized access.
- LDAP plays a key role in access control, determining what specific files or directories a user has permission to access.
Let’s say you want to access the file /user/data/large_data.csv:
- You log into your system, and Kerberos provides you with a TGT.
- You then try to read the file
/user/data/large_data.csvfrom HDFS.- HDFS NameNode checks your Kerberos ticket for authentication.
- Once your identity is validated, the NameNode queries LDAP to verify if you have permission to access
/user/data/large_data.csv(e.g., are you part of the "DataAnalysts" group with read access?).
- If Kerberos confirms your identity and LDAP confirms that you have the necessary permissions, you are granted access to the file, and you can proceed with reading the data from the HDFS DataNode.
Before we proceed with implementing Kerberos and LDAP integration for Hadoop, make sure you meet the following prerequisites:
- At least one Linux server (can be a virtual machine or physical machine) to act as the LDAP Server and Kerberos Server.
- At least one additional Linux server (if needed) to act as the Hadoop Cluster Node.
- Linux-based Operating System (Ubuntu, CentOS, or Debian-based distributions recommended).
- OpenLDAP server (for centralized directory services).
- Kerberos KDC (Key Distribution Center) server for authentication.
- Hadoop (HDFS, YARN) for testing authentication and authorization integration.
- Java (installed, as Hadoop and Kerberos require it).
- sudo/root access to configure system-wide services and install packages.
- Private network or virtual machines should be configured for communication between the LDAP server, Kerberos server, and Hadoop nodes.
- Static IP addresses for servers (recommended for consistent configuration and ease of access).
- The domain and DNS settings should be properly configured to resolve hostnames (e.g.,
kerberos.example.com,ldap.example.com). - Reverse DNS lookup should be configured for Kerberos to work correctly.
- Sufficient disk space to store LDAP data and Kerberos database.
- Time Synchronization: All servers involved (Kerberos, LDAP, Hadoop) must be synchronized in time using NTP (Network Time Protocol), as Kerberos relies on time stamps to authenticate.
-
Kerberos Packages:
krb5-kdc: Kerberos Key Distribution Center.krb5-admin-server: Kerberos Administration Server.krb5-config: Kerberos configuration utilities.krb5-user: Kerberos client utilities.
-
LDAP Packages:
slapd: OpenLDAP server daemon.ldap-utils: Command-line tools to interact with LDAP.nsswitch.conf: Should be configured for LDAP integration.
-
Hadoop Setup:
- Ensure Hadoop (HDFS, YARN, MapReduce) is already installed and functional before integrating Kerberos for authentication.
- Hadoop Configuration Files:
core-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xml
-
Java:
- Java Development Kit (JDK) installed (required for both Hadoop and Kerberos).
- Kerberos Realm: Ensure you have a domain/realm to be used in Kerberos (e.g.,
EXAMPLE.COM). - Kerberos Keytab: Keytab files are required for service authentication and for storing service principal credentials securely.
- Kerberos Administrative Access: Access to create, manage, and administer Kerberos principal and keytab files.
- LDAP Base DN: You should know the directory structure and Base DN (e.g.,
dc=example,dc=com) for your LDAP configuration. - LDAP User and Group Creation: Plan the structure of users and groups in your LDAP directory.
- LDAP Admin Access: You need the credentials (username and password) of the LDAP admin to create and manage user entries.
- Kerberos Principal: Define Kerberos principals for Hadoop services (e.g.,
hdfs/_HOST@EXAMPLE.COM,yarn/_HOST@EXAMPLE.COM). - Hadoop Keytab: Create a keytab file for Hadoop services to authenticate using Kerberos tickets.
- Kerberos Authentication in Hadoop: Configure Hadoop to use Kerberos authentication in the core configuration files (
core-site.xml,hdfs-site.xml).
-
Sudo or Root Access: Required to install software, edit system configurations, and restart services.
-
Basic Linux Administration Knowledge:
- Knowledge of managing system services (
systemctlfor starting, stopping services). - Understanding of basic LDAP commands (
ldapadd,ldapsearch). - Familiarity with Hadoop configuration files and adjusting settings.
- Knowledge of managing system services (
-
Kerberos Administration: Basic understanding of Kerberos principles, ticket generation, and keytab management.
- Commands like
kadmin.local,klist, andkinitare essential for managing Kerberos tickets and principals.
- Commands like
-
LDAP Administration: Basic knowledge of managing users and groups in LDAP directories.
- Commands like
ldapadd,ldapmodify, andldapdelete.
- Commands like
- Strong Passwords: Ensure all user and service accounts have strong passwords.
- TLS/SSL: For encrypted LDAP communication, use LDAPS (LDAP over SSL/TLS).
- Firewall Configuration: Open necessary ports (389 for LDAP, 636 for LDAPS, 88 for Kerberos) between the servers and clients.
- Backup: Ensure the LDAP and Kerberos databases are regularly backed up.
- Time Synchronization: Ensure all servers are synchronized with NTP, as Kerberos is sensitive to time drift.
- Install and configure NTP:
sudo apt install ntp - Synchronize the system time:
sudo systemctl start ntp
- Install and configure NTP:
Before starting, ensure you have backups of all critical configuration files (such as /etc/krb5.conf, /etc/ldap/slapd.conf, Hadoop configuration files) and have a rollback strategy in case the integration encounters any issues.
By ensuring that these prerequisites are met, you’ll be well-prepared to successfully implement Kerberos with LDAP for centralized authentication and integration with Hadoop.
This Implementation provides a step-by-step process for integrating Kerberos with LDAP for centralized authentication in a Linux environment, allowing secure and manageable authentication for users accessing Hadoop or other services.
-
Update Package Repositories:
sudo apt update -
Install OpenLDAP Server and Utilities:
sudo apt install slapd ldap-utils -
Configure LDAP Server: Reconfigure the LDAP server (
slapd) by running:sudo dpkg-reconfigure slapdFollow the prompts:
- Omit OpenLDAP server configuration?: No
- DNS domain name:
example.com - Organization name:
Example Inc. - Admin password: Set a strong password for the LDAP admin user.
- Database backend: HDB
- Allow LDAPv2 protocol: No
-
Start and Enable the LDAP Service:
sudo systemctl start slapd sudo systemctl enable slapd
-
Create LDIF File for Users and Groups: Create a file
add_users.ldif:dn: uid=john,ou=users,dc=example,dc=com objectClass: inetOrgPerson uid: john cn: John Doe sn: Doe userPassword: johnspassword mail: john.doe@example.com dn: cn=developers,ou=groups,dc=example,dc=com objectClass: posixGroup cn: developers gidNumber: 10000
-
Add Users and Groups:
sudo ldapadd -x -D "cn=admin,dc=example,dc=com" -W -f add_users.ldif -
Verify the Entries:
sudo ldapsearch -x -b "dc=example,dc=com" "(uid=john)"
-
Install Kerberos Packages:
sudo apt install krb5-kdc krb5-admin-server krb5-config -
Configure Kerberos Server: Edit
/etc/krb5.confto specify the Kerberos realm (ensure the domain is in uppercase):[libdefaults] default_realm = EXAMPLE.COM dns_lookup_realm = false dns_lookup_kdc = false [realms] EXAMPLE.COM = { kdc = kerberos.example.com admin_server = kerberos.example.com } [domain_realm] .example.com = EXAMPLE.COM example.com = EXAMPLE.COM
-
Initialize the Kerberos Database:
sudo krb5_newrealm -
Start the Kerberos Services:
sudo systemctl start krb5-kdc sudo systemctl enable krb5-kdc sudo systemctl start krb5-admin-server sudo systemctl enable krb5-admin-server
-
Edit the Kerberos Configuration: Modify the
/etc/krb5.conffile to include the LDAP server for user authentication:[realms] EXAMPLE.COM = { kdc = kerberos.example.com admin_server = kerberos.example.com default_domain = example.com ldap_servers = ldap://localhost ldap_base_dn = dc=example,dc=com ldap_bind_dn = cn=admin,dc=example,dc=com ldap_bind_password = <password> }
-
Configure PAM for Kerberos Authentication: Edit
/etc/pam.d/common-authto enable PAM (Pluggable Authentication Modules) for Kerberos authentication:auth required pam_krb5.so -
Configure NSS for LDAP: Edit
/etc/nsswitch.confto use LDAP for user and group lookup:passwd: compat ldap group: compat ldap shadow: compat ldap
-
Edit
core-site.xml: Enable Kerberos authentication for Hadoop incore-site.xml:<configuration> <property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property> <property> <name>hadoop.security.kerberos.principal</name> <value>hadoop/_HOST@EXAMPLE.COM</value> </property> <property> <name>hadoop.security.kerberos.keytab</name> <value>/etc/security/keytabs/hadoop.keytab</value> </property> </configuration>
-
Edit
hdfs-site.xml: Configure HDFS to use Kerberos authentication:<configuration> <property> <name>dfs.namenode.kerberos.principal</name> <value>hdfs/_HOST@EXAMPLE.COM</value> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>hdfs/_HOST@EXAMPLE.COM</value> </property> </configuration>
-
Configure Hadoop to Use LDAP for Authentication: Update the
hadoop-env.shfile to include LDAP configurations:export HADOOP_OPTS="$HADOOP_OPTS -Dldap.url=ldap://localhost:389" export HADOOP_OPTS="$HADOOP_OPTS -Dldap.auth.method=simple" export HADOOP_OPTS="$HADOOP_OPTS -Dldap.baseDN=dc=example,dc=com"
-
Authenticate User via Kerberos: Test the integration by logging into Hadoop with a Kerberos-authenticated user:
hadoop fs -ls / -
Verify User Access: Ensure that the Kerberos-authenticated user can access Hadoop resources and services without entering credentials each time.
-
Verify Kerberos Tickets: To verify the Kerberos tickets, use:
klist -
Logs and Error Messages:
- If there are authentication errors, check the logs for LDAP (
/var/log/slapd.log) and Kerberos (/var/log/krb5kdc.log). - Ensure that all time synchronizations are correct (Kerberos is sensitive to time drift).
- If there are authentication errors, check the logs for LDAP (
-
Securing Kerberos and LDAP:
- Use secure LDAP (LDAPS) instead of LDAP to encrypt communication between Kerberos and LDAP servers.
- Regularly rotate the Kerberos keys and maintain a secure password policy.
In this Implementation, we have configured Kerberos and LDAP on Linux to enable centralized authentication, integrated them with Hadoop for secure access control, and ensured that users authenticate via Kerberos. This integration enhances security, scalability, and ease of management for large-scale systems.
👨💻 𝓒𝓻𝓪𝓯𝓽𝓮𝓭 𝓫𝔂: Suraj Kumar Choudhary | 📩 𝓕𝓮𝓮𝓵 𝓯𝓻𝓮𝓮 𝓽𝓸 𝓓𝓜 𝓯𝓸𝓻 𝓪𝓷𝔂 𝓱𝓮𝓵𝓹: csuraj982@gmail.com