[Kurz] Program kurzu (obsah přednášky/semináře/rekvalifikace/studia) ...
Goals Through instructor-led discussion and interactive, hands-on exercises, participants will lea> Apache Spark and how it integrates with the entire Hadoop ecosystem, learning:
- How data is distributed, stored, and processed in a Hadoop cluster
- How to use Sqoop and Flume to ingest data
- How to process distributed data with Apache Spark
- How to model structured data as tables in Impala and Hive
- How to choose the best data storage format for different data usage patterns
- Best practices for data storage
* Through instructor-led discussion and interactive, hands-on exercises, participants will lea> Apache Spark and how it integrates with the entire Hadoop ecosystem, learning:
- How data is distributed, stored, and processed in a Hadoop cluster
- How to use Sqoop and Flume to ingest data
- How to process distributed data with Apache Spark
- How to model structured data as tables in Impala and Hive
- How to choose the best data storage format for different data usage patterns
- Best practices for data storage
Outline Read the entire course outline for more details. Prerequisites
- Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required
- Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful
- Prior knowledge of Hadoop is not required