Using Ansible to install Hive on a Spark cluster.

What is Hive? Apache Hive is a distributed, fault-tolerant data warehouse system, built on top of Hadoop, designed to simplify and streamline the processing of large datasets. Through Hive, a user can manage and analyze massive volumes of data by organizing it into tables, resembling a traditional relational database. Hive uses the HiveQL (HQL) language, which is very similar to SQL. These SQL-like queries get translated into MapReduce tasks, leveraging the power of Hadoop’s MapReduce functionalities while bypassing the need to know how to program MapReduce jobs. ...

September 4, 2023 · 11 min · Naveen Kannan

Using Ansible to remotely configure a cluster.

What is Ansible? Ansible is an open-source IT automation tool that allows for automated management of remote systems. A basic Ansible environment has the following three components: Control Node: This is a system on which Ansible is installed, and the system from which Ansible commands such as ansible-inventory are issued. This is also where Ansible playbooks and configuration files are stored. Managed node: This is a remote system that Ansible intends to manage and configure. ...

June 24, 2023 · 10 min · Naveen Kannan