This training course is for those who want to move to Big Data (Hadoop) as a career. Consisting of in-class training and real-time project. This course consists of two separate modules.
The module provides Hadoop Overview and it will give you an overview of big data strategy as well as a why it is important to understand and use big data. It will cover Big Data as a platform for managing and gaining insights from your big data. As such, you will see how the Companies have aligned their offerings to better suit your needs with the Open Data Platform along with the three specialized modules with value-add that sits on top of the ODP.
This module provides an in-depth introduction to the main components of the ODP core –namely Apache Hadoop (inclusive of HDFS, YARN, and MapReduce), Apache Ambari, Apache Hive and Hbase. Students will have the opportunity to experience the programming languages used to load and query and analyze the data. This course will also guide the students on major Hadoop vendors (IBM, Hortonworks and Cloudera) packaging structures and how they differ.
Key Topics
- Understand the purpose of big data and know why it is important
- List the sources of data (data-at-rest vs data-in-motion)
- Describe the major components of the open-source Apache Hadoop stack.
- Manage and monitor Hadoop clusters with Apache Ambari and related components
- Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.
- Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).
- Create and run basic MapReduce jobs.
- Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper
- Explore common methods for performing data movement
- Configure Flume for data loading
- Move data into the Hadoop from relational databases using Sqoop
- Understand when to use various data storage formats (flat files, CSV/delimited, Sequence files etc..
- Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R) Query data from Hive.
- Perform random access on data stored in HBase.
- Explore advanced concepts, including Oozie
Additional:
- Describe the BigData Offerings from IBM ( BigInsights, Streams and SPSS)
- Utilize the various IBM BigInsights tools including Big SQL, BigSheets, for your big data needs.