Connectors

Connectors provide pluggable integration to any external data service so that an admin can easily allow end users to interact with them.

Databases

SqlAlchemy

SqlAlchemy is the prefered way if the Hive API is not supported by the database. The core implementation is in sql_alchemy.py and relies on each respective SqlAlchemy dialect.

Hive Interface

This asynchronous API based on the Thrift API of Hive is very mature and powers an excellent integration of Apache Hive and Apache Impala.

Custom

If the built-in HiveServer2 (Hive, Impala, Spark SQL), SqlAlchemy (MySQL, PostgreSQL, Oracle, Presto…) don’t meet your needs, you can implement your own connector to the notebook app:

The JDBC API relies on a small JDBC proxy running next to the Hue API. By default it won't be built without setting the BUILD_DB_PROXY flag, e.g.:

export BUILD_DB_PROXY=true make install

Note In the long term, SqlAlchemy is prefered as more “Python native”.

Potential connectors

It is recommended to develop an SqlAlchemy connector if yours is not already existing.

Catalogs

The backends is pluggable by providing alternative client interfaces:

  • Apache Atlas
  • Cloudera Navigator
  • Dummy (skeleton for integrating new catalogs)

Apache Atlas

Potential connectors

Storages

Various storage systems can be interacted with. The fsmanager.py is the main router to each API.

Note Apache Ozone as well as Ceph can be used via the S3 browser.

Hadoop HDFS

AWS S3

Azure ADLS

HBase / Key Value Stores

With just a few changes in the Python API, the HBase browser could be compatible with Apache Kudu or Google Big Table.

Potential connectors

  • Google Cloud Storage is currently a work in progress with HUE-8978

Jobs

Apache Spark / Livy

Based on the Livy REST API.

Schedulers

Currently only Apache Oozie is supported for your Datawarehouse, but the API is getting generic with HUE-3797.

Potential connectors