The File Import API provides endpoints for uploading, analyzing, and previewing files that can be imported into various SQL engines. This API simplifies the process of creating database tables from files like CSV, TSV, and Excel spreadsheets.
Note: All API endpoints require authentication. For endpoints that work with remote files (
import_type=remote
), the API uses the file system permissions associated with the authenticated user's session.
The File Import API allows you to:
A typical workflow for importing a file into a database table involves these steps:
/api/importer/upload/file/
endpoint/api/importer/file/guess_metadata/
endpoint/api/importer/file/guess_header/
endpoint/api/importer/file/preview/
endpointUpload a file from your local system to the Hue server.
Endpoint: /api/importer/upload/file/
Method: POST
Content Type: multipart/form-data
Request Parameters:
Name | Type | Required | Description |
---|---|---|---|
file | File | Yes | The file to upload (csv, tsv, excel) |
Example using cURL:
curl -X POST \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
-F "file=@/path/to/sales_data.csv" \
https://demo.gethue.com/api/importer/upload/file/
Example using JavaScript:
// Using fetch API
const formData = new FormData();
formData.append('file', fileInputElement.files[0]);
fetch('https://demo.gethue.com/api/importer/upload/file/', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
},
body: formData
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
Response:
{
"file_path": "/tmp/username_abc123_sales_data.csv"
}
Status Codes:
201 Created
- File was uploaded successfully400 Bad Request
- Invalid file format or size500 Internal Server Error
- Server-side errorRestrictions:
IMPORTER.MAX_LOCAL_FILE_SIZE_UPLOAD_LIMIT
IMPORTER.RESTRICT_LOCAL_FILE_EXTENSIONS
Analyze a file to determine its type and metadata properties such as delimiters for CSV files or sheet names for Excel files.
Endpoint: /api/importer/file/guess_metadata/
Method: GET
Request Parameters:
Name | Type | Required | Description |
---|---|---|---|
file_path | String | Yes | Full path to the file to analyze |
import_type | String | Yes | Type of import, either local or remote |
Example using cURL:
# For a local file uploaded previously
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/guess_metadata/?file_path=/tmp/username_abc123_sales_data.csv&import_type=local"
# For a remote file on HDFS
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/guess_metadata/?file_path=/user/hue/data/sales_data.csv&import_type=remote"
Example using JavaScript:
// Using fetch API for a local file
fetch('https://demo.gethue.com/api/importer/file/guess_metadata/?file_path=/tmp/username_abc123_sales_data.csv&import_type=local', {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
})
.then(response => response.json())
.then(metadata => console.log(metadata))
.catch(error => console.error('Error:', error));
Response Examples:
For CSV files:
{
"type": "csv",
"field_separator": ",",
"quote_char": "\"",
"record_separator": "\n"
}
For TSV files:
{
"type": "tsv",
"field_separator": "\t",
"quote_char": "\"",
"record_separator": "\n"
}
For Excel files:
{
"type": "excel",
"sheet_names": ["Sales 2024", "Sales 2025", "Analytics"]
}
Status Codes:
200 OK
- Metadata detected successfully400 Bad Request
- File not found or invalid parameters500 Internal Server Error
- Server-side error during detectionAnalyze a file to determine if it has a header row. This API uses heuristics to detect if the first row appears to contain column names rather than data.
Endpoint: /api/importer/file/guess_header/
Method: GET
Request Parameters:
Name | Type | Required | Description |
---|---|---|---|
file_path | String | Yes | Full path to the file to analyze |
file_type | String | Yes | Type of file (csv , tsv , excel , delimiter_format ) |
import_type | String | Yes | Type of import, either local or remote |
sheet_name | String | No | Sheet name (required for Excel files) |
Example using cURL:
# For a CSV file
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/guess_header/?file_path=/tmp/username_abc123_sales_data.csv&file_type=csv&import_type=local"
# For an Excel file
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/guess_header/?file_path=/tmp/username_abc123_financial_report.xlsx&file_type=excel&import_type=local&sheet_name=Q1_Results"
Example using JavaScript:
const params = new URLSearchParams({
file_path: '/tmp/username_abc123_sales_data.csv',
file_type: 'csv',
import_type: 'local'
});
fetch(`https://demo.gethue.com/api/importer/file/guess_header/?${params.toString()}`, {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
})
.then(response => response.json())
.then(result => {
console.log("Has header:", result.has_header);
// Use this information to configure preview or import
})
.catch(error => console.error('Error:', error));
Response:
{
"has_header": true
}
How Header Detection Works:
The header detection algorithm uses multiple factors to make an educated guess:
Status Codes:
200 OK
- Header detection successful400 Bad Request
- Invalid parameters or file not found500 Internal Server Error
- Server-side errorGenerate a preview of a file's content with column type mapping for creating SQL tables.
Endpoint: /api/importer/file/preview/
Method: GET
Request Parameters:
Name | Type | Required | Description |
---|---|---|---|
file_path | String | Yes | Full path to the file to preview |
file_type | String | Yes | Type of file (csv , tsv , excel , delimiter_format ) |
import_type | String | Yes | Type of import (local or remote ) |
sql_dialect | String | Yes | SQL dialect for type mapping (hive , impala , trino , phoenix , sparksql ) |
has_header | Boolean | Yes | Whether the file has a header row |
sheet_name | String | No | Sheet name (required for Excel files) |
field_separator | String | No | Field separator character (defaults to , for CSV, \t for TSV, required for delimiter_format ) |
quote_char | String | No | Quote character (defaults to " ) |
record_separator | String | No | Record separator character (defaults to \n ) |
Example using cURL:
# For a CSV file with header
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/preview/?file_path=/tmp/username_abc123_sales_data.csv&file_type=csv&import_type=local&sql_dialect=hive&has_header=true"
# For an Excel file with header
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/preview/?file_path=/tmp/username_abc123_financial_report.xlsx&file_type=excel&import_type=local&sql_dialect=impala&has_header=true&sheet_name=Q1_Results"
# For a custom pipe-delimited file using delimiter_format
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/file/preview/?file_path=/tmp/username_abc123_pipe_data.txt&file_type=delimiter_format&import_type=local&sql_dialect=hive&has_header=true&field_separator=|"e_char=\"&record_separator=\n"
Example using JavaScript:
// Parameters for previewing a CSV file
const params = new URLSearchParams({
file_path: '/tmp/username_abc123_sales_data.csv',
file_type: 'csv',
import_type: 'local',
sql_dialect: 'hive',
has_header: 'true'
});
// Make the fetch request
fetch(`https://demo.gethue.com/api/importer/file/preview/?${params.toString()}`, {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
})
.then(response => response.json())
.then(previewData => {
// Use preview data to display to user or generate CREATE TABLE statement
console.log(previewData);
})
.catch(error => console.error('Error:', error));
Response:
For a sales data CSV file:
{
"type": "csv",
"columns": [
{
"name": "transaction_id",
"type": "INT"
},
{
"name": "product_name",
"type": "STRING"
},
{
"name": "category",
"type": "STRING"
},
{
"name": "price",
"type": "DOUBLE"
},
{
"name": "quantity",
"type": "INT"
},
{
"name": "purchase_date",
"type": "DATE"
}
],
"preview_data": [
["1001", "Laptop XPS 13", "Electronics", "1299.99", "1", "2025-01-15"],
["1002", "Wireless Headphones", "Electronics", "149.99", "2", "2025-01-15"],
["1003", "Office Chair", "Furniture", "249.50", "1", "2025-01-16"],
["1004", "Notebook Set", "Office Supplies", "24.95", "5", "2025-01-17"]
]
}
Using Preview Data to Generate a CREATE TABLE Statement:
You can use the preview data to generate SQL statements for table creation. Here's an example for Hive:
CREATE TABLE sales_data (
transaction_id INT,
product_name STRING,
category STRING,
price DOUBLE,
quantity INT,
purchase_date DATE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
About delimiter_format
File Type:
The delimiter_format
file type allows you to process custom delimited files that don't follow standard CSV or TSV formats. When using this file type:
field_separator
is required and must be explicitly specifiedquote_char
and record_separator
should be provided for proper parsing|
), semi-colon delimited files (;
), or files with custom record separatorsParameter Validation and Best Practices:
sheet_name
is required and the API will return a 400 error if not providedcsv
, tsv
), default values will be applied if optional parameters are not specifieddelimiter_format
files, you must explicitly provide at least the field_separator
guess_metadata
endpoint:
field_separator
from guess_metadata responsequote_char
from guess_metadata responserecord_separator
from guess_metadata responseimport_type
is set to remote
, the API uses the file system associated with the user's requestStatus Codes:
200 OK
- Preview generated successfully400 Bad Request
- Invalid parameters or file not found500 Internal Server Error
- Server-side errorGet mapping from Polars data types to SQL types for a specific SQL dialect. This helps in translating detected column types to appropriate SQL data types when creating tables.
Endpoint: /api/importer/sql_type_mapping/
Method: GET
Request Parameters:
Name | Type | Required | Description |
---|---|---|---|
sql_dialect | String | Yes | SQL dialect for type mapping (hive , impala , trino , phoenix , sparksql ) |
Example using cURL:
curl -X GET \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
"https://demo.gethue.com/api/importer/sql_type_mapping/?sql_dialect=hive"
Example using JavaScript:
fetch('https://demo.gethue.com/api/importer/sql_type_mapping/?sql_dialect=hive', {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
})
.then(response => response.json())
.then(typeMappings => {
console.log(typeMappings);
// Use type mappings to generate SQL schema
})
.catch(error => console.error('Error:', error));
Response:
{
"Int8": "TINYINT",
"Int16": "SMALLINT",
"Int32": "INT",
"Int64": "BIGINT",
"UInt8": "TINYINT",
"UInt16": "SMALLINT",
"UInt32": "INT",
"UInt64": "BIGINT",
"Float32": "FLOAT",
"Float64": "DOUBLE",
"Decimal": "DECIMAL",
"Boolean": "BOOLEAN",
"Utf8": "STRING",
"String": "STRING",
"Binary": "BINARY",
"Date": "DATE",
"Time": "TIMESTAMP",
"Datetime": "TIMESTAMP",
"Duration": "INTERVAL DAY TO SECOND",
"Array": "ARRAY",
"List": "ARRAY",
"Struct": "STRUCT"
}
Dialect-Specific Type Mappings
Different SQL dialects have variations in type names. Here are some key differences:
Trino Types (compared to Hive):
{
"Int32": "INTEGER",
"UInt32": "INTEGER",
"Utf8": "VARCHAR",
"String": "VARCHAR",
"Binary": "VARBINARY",
"Float32": "REAL",
"Struct": "ROW",
"Object": "JSON"
}
Phoenix Types (compared to Hive):
{
"UInt8": "UNSIGNED_TINYINT",
"UInt16": "UNSIGNED_SMALLINT",
"UInt32": "UNSIGNED_INT",
"UInt64": "UNSIGNED_LONG",
"Utf8": "VARCHAR",
"String": "VARCHAR",
"Binary": "VARBINARY",
"Time": "TIME"
}
Status Codes:
200 OK
- Mapping retrieved successfully400 Bad Request
- Invalid SQL dialect500 Internal Server Error
- Server-side errorAll endpoints follow a consistent error response format:
{
"error": "Error message describing what went wrong"
}
Example Error Responses:
File not found:
{
"error": "File /tmp/nonexistent_file.csv not found"
}
Invalid parameter:
{
"error": "Sheet name is required for Excel files"
}
File size limit exceeded:
{
"error": "File too large. Maximum file size is 100 MiB."
}
Authentication errors will return a standard HTTP 401 Unauthorized response.
Note: Authentication tokens can be obtained through Hue's authentication API. See the Authentication API documentation for details.
Here's an example workflow that combines all the APIs to import a CSV file into a Hive table:
// Upload the CSV file
const formData = new FormData();
formData.append('file', fileInputElement.files[0]);
const uploadResponse = await fetch('https://demo.gethue.com/api/importer/upload/file/', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
},
body: formData
});
const uploadResult = await uploadResponse.json();
const filePath = uploadResult.file_path;
// Get file metadata
const metadataParams = new URLSearchParams({
file_path: filePath,
import_type: 'local'
});
const metadataResponse = await fetch(`https://demo.gethue.com/api/importer/file/guess_metadata/?${metadataParams.toString()}`, {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
});
const metadata = await metadataResponse.json();
const fileType = metadata.type;
const fieldSeparator = metadata.field_separator;
const quoteChar = metadata.quote_char;
const recordSeparator = metadata.record_separator;
// Detect if file has a header
const headerParams = new URLSearchParams({
file_path: filePath,
file_type: fileType,
import_type: 'local'
});
const headerResponse = await fetch(`https://demo.gethue.com/api/importer/file/guess_header/?${headerParams.toString()}`, {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
});
const headerResult = await headerResponse.json();
const hasHeader = headerResult.has_header;
// Generate file preview with SQL types
const previewParams = new URLSearchParams({
file_path: filePath,
file_type: fileType,
import_type: 'local',
sql_dialect: 'hive',
has_header: hasHeader,
field_separator: fieldSeparator,
quote_char: quoteChar,
record_separator: recordSeparator
});
const previewResponse = await fetch(`https://demo.gethue.com/api/importer/file/preview/?${previewParams.toString()}`, {
method: 'GET',
headers: {
'Authorization': 'Bearer <YOUR_JWT_TOKEN>'
}
});
const preview = await previewResponse.json();
// Generate SQL CREATE TABLE statement
const tableName = 'imported_sales_data';
const columns = preview.columns.map(col => ` ${col.name} ${col.type}`).join(',\n');
const createTableSQL = `CREATE TABLE ${tableName} (\n${columns}\n)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '${fieldSeparator}'
STORED AS TEXTFILE;`;
console.log(createTableSQL);
// You can then execute this SQL using the SQL execution API
Error Handling: Always implement proper error handling for API responses, especially for file uploads which may fail due to size limitations or network issues.
Progress Feedback: For large file uploads, consider implementing progress indicators using XHR instead of fetch to track upload progress.
Type Checking: When parsing and validating file data, always check that the inferred types match what you expect for your business data.
Sequential Processing: Process API calls in the proper sequence as shown in the workflow example to ensure each step has the required data.
User Confirmation: Allow users to review and modify the automatic type detection before creating tables, especially for large datasets.
Access Control: Ensure file permissions are properly set for any uploaded files, especially in multi-user environments.
Cleanup: Consider implementing cleanup mechanisms for temporary uploaded files that are no longer needed. Files uploaded via the /api/importer/upload/file/
endpoint are stored in temporary locations (typically /tmp/
) and should be cleaned up after processing.