Muestra las diferencias entre dos versiones de la página.
| Ambos lados, revisión anterior Revisión previa Próxima revisión | Revisión previa | ||
|
wiki2:hadoop:ecosystem [2019/05/06 15:43] alfred [HBase & Phoenix] |
wiki2:hadoop:ecosystem [2020/05/09 09:25] (actual) |
||
|---|---|---|---|
| Línea 28: | Línea 28: | ||
| Se puede extender Hive con User Defined Functions. También puedes cargar datos con varias aplicaciones o formatos (avro, xml...). También se puede usar con Spark (Spark puede usar Hive para obtener datos). | Se puede extender Hive con User Defined Functions. También puedes cargar datos con varias aplicaciones o formatos (avro, xml...). También se puede usar con Spark (Spark puede usar Hive para obtener datos). | ||
| + | **Avro** es un formato optimizado para cargar en clusters. Otro formato para Hadoop es el denominado **Parquet**, Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language. | ||
| ===== Formas de leer datos en real time ===== | ===== Formas de leer datos en real time ===== | ||
| Línea 77: | Línea 78: | ||
| ===== Impala ===== | ===== Impala ===== | ||
| Impala makes use of many familiar components within the Hadoop ecosystem. Impala can interchange data with other Hadoop components, as both a consumer and a producer, so it can fit in flexible ways into your ETL and ELT pipelines. | Impala makes use of many familiar components within the Hadoop ecosystem. Impala can interchange data with other Hadoop components, as both a consumer and a producer, so it can fit in flexible ways into your ETL and ELT pipelines. | ||
| - | ===== Other productes ===== | ||
| - | * **Apache Ranger** permite asignar permisos y autenticación en hadoop. | + | |
| - | * **Hue** is a web-based interactive query editor that enables you to interact with data warehouses. | + | ===== Apache Ranger ===== |
| + | |||
| + | **Apache Ranger** permite asignar permisos y autenticación en hadoop. | ||
| + | |||
| + | ===== Apache Hue ===== | ||
| + | |||
| + | **Hue** is a web-based interactive query editor that enables you to interact with data warehouses. | ||
| + | |||
| + | ===== Apache Sqoop ===== | ||
| + | |||
| + | Apache Sqoop is a tool that uses MapReduce to transfer data between Hadoop clusters and relational databases very efficiently. It works by spawning tasks on multiple data nodes to download various portions of the data in parallel. When you're finished, each piece of data is replicated to ensure reliability, and spread out across the cluster to ensure you can process it in parallel on your cluster. | ||
| + | |||
| + | The nice thing about Sqoop is that we can automatically load our relational data from MySQL into HDFS, while preserving the structure. | ||
| + | |||
| + | Hive and Impala also allow you to create a schema for the HDFS files using ''CREATE EXTERNAL TABLE'' commands. However Sqoop does that authomatically. | ||
| + | ===== Notes ===== | ||
| * **Cloudbase** is a group of tools already pre-installed on a Linux distribution to make easier the use of Hadoop technologies. | * **Cloudbase** is a group of tools already pre-installed on a Linux distribution to make easier the use of Hadoop technologies. | ||
| + | * {{ :wiki2:hadoop:traditional_etl_vs_elt_on_hadoop.pdf |ETL and ELT}} | ||