Connectors

They provide integration with any SQL database or Job execution engine. Here is a list of the existing connectors.

Connectors are pluggable and new engines can be added. Feel free to contact the community.

Querying

SQL

SqlAlchemy

SqlAlchemy is the prefered way if the HiveServer2 API is not supported by the database. The implementation is in sql_alchemy.py and is depends on the repective SqlAlchemy dialects.

Kafka SQL

Kafka connector.

Solr SQL

Solr connector.

Custom

If the built-in HiveServer2 (Hive, Impala, Spark SQL), RDBMS (MySQL, PostgreSQL, Oracle, SQLite), and JDBC interfaces don’t meet your needs, you can implement your own connector to the notebook app:

JDBC

With the JDBC proxy, query editor with any JDBC compatible database. View the JDBC connector.

Note In the long term, SqlAlchemy is prefered as more “Python native”.

Jobs

Spark / Livy

Based on the Livy REST API.

Oozie

MapReduce, Pig, Java, Shell, Sqoop, DistCp Oozie connector.

Job Browser

The Job Browser is generic and can list any type of jobs, queries and provide bulk operations like kill, pause, delete… and access to logs and recommendations.

Here is its API.

SQL Queries

The API currently supports:

Spark / Livy

Oozie

File Browser

Various storage systems can be interacted with. The fsmanager.py is the main router to each API.

Note Ceph can be used via the S3 browser.

Hadoop HDFS

AWS S3

Azure ADLS

HBase / Key Value Stores

With just a few changes in the Python API, the HBase browser could be compatible with Apache Kudu or Google Big Table.

Dashboard

Dashboards are generic and support Apache Solr and SQL:

The API was influenced by Solr but is now generic:

Dashboard API

SQL

SQL API

Implementations:

Apache Solr

Solr Dashboard API

A connector similar to Solr or SQL Alchemy binding would need to be developed HUE-7828.

Data Catalog

The backends is pluggable by providing alternative client interfaces:

  • Cloudera Navigator (default)
  • Dummy (skeleton for integrating new catalogs)

Apache Atlas

Scheduling

Oozie

Currently only Apache Oozie is supported for your Datawarehouse, but the API is getting generic with HUE-3797 that is bringing Celery Beat integration.