Skip to content

aliyun/aliyun-odps-jdbc

Repository files navigation

ODPS JDBC

build Maven Central

Chinese Docs

MaxCompute JDBC介绍

Installation

Generally, there are two ways to use ODPS JDBC driver in your project.

1.The first one is to use the standalone library:

2.The second is to rely on maven to resolve the dependencies for you:

<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-jdbc</artifactId>
    <version>VERSION</version>
</dependency>

Getting Started

Using ODPS JDBC driver is just as using other JDBC drivers. It contains the following few steps:

1. Explictly load the ODPS JDBC driver using Class.forName():

Class.forName("com.aliyun.odps.jdbc.OdpsDriver");

2. Connect to the ODPS by creating a Connection object with the JDBC driver:

Connection conn=DriverManager.getConnection(url,accessId,accessKey);

The ODPS server works with RESTful API, so the url looks like:

String url="jdbc:odps:ENDPOINT?project=PROJECT_NAME&charset=UTF-8";

The connection properties can also be passed through Properties. For example:

Properties config=new Properties();
    config.put("access_id","...");
    config.put("access_key","...");
    config.put("project_name","...");
    config.put("charset","...");
    Connection conn=DriverManager.getConnection("jdbc:odps:<endpoint>",config);

3. Submit SQL to ODPS by creating Statement object and using its executeQuery() method:

Statement stmt=conn.createStatement();
    ResultSet rs=stmt.executeQuery("SELECT foo FROM bar");

4. Process the result set.

For example:

while(rs.next()){
    ...
    }

Connection String Parameters

It is recommended that the key and value in URL should be encoded by using java.net.URLEncoder#encode(java.lang.String).

Basic

URL key Property Key Required Default value Description
endpoint end_point True The endpoint of your MaxCompute service
project project_name True The name of your MaxCompute project
accessId access_id True Your Alibaba Cloud access key ID
accessKey access_key True Your Alibaba Cloud access key secret
interactiveMode interactive_mode False false Enable MCQA (interactive mode: maxqa/mcqa/offline, or true/false for simple boolean)
logview logview_host False Provided by MC The endpoint of MaxCompute Logview
tunnelEndpoint tunnel_endpoint False Provided by MC The endpoint of the MaxCompute Tunnel service
enableOdpsLogger enable_odps_logger False false Enable MaxCompute JDBC logger

Advanced

URL key Property Key Required Default value Description
stsToken sts_token False The Alibaba Cloud STS token
logConfFile log_conf_file False The configuration path for SLF4J
charset charset False UTF-8 The charset of the inputs and outputs
executeProject execute_project_name False For MCQA, the name of the MaxCompute project in which actually execute the queries
alwaysFallback always_fallback False false For MCQA, fall back to regular mode if any exception happened
instanceTunnelMaxRecord instance_tunnel_max_record False -1 (unlimited) For MCQA, max number of records within a result set, enableLimit option should set to false
instanceTunnelMaxSize instance_tunnel_max_size False -1 (unlimited) For MCQA, max size of a result set in byte
enableLimit enable_limit False true(limited) Download permission won't be checked if enableLimit is set true, but your result record count will be limited to 10000
autoLimitFallback auto_limit_fallback False False(no auto fallback) For non-MCQA mode, result record count will be limited to 10000 when no download permission exception happened and autoLimitFallback is set to true

MCQA

URL key Property Key Required Default value Description
interactiveMode interactive_mode False false Enable MCQA (interactive mode: maxqa/mcqa/offline, or true/false for simple boolean)
interactiveServiceName interactive_service_name False public.default The interactive service name to use (for MCQA)
majorVersion major_version False default The major version to use for execution
autoSelectLimit auto_select_limit False -1(limited) Set limit for number of records returned in selects without explicit limit conditions
fallbackForUnknownError fallback_for_unknownerror False true Whether to fall back for unknown errors
fallbackForResourceNotEnough fallback_for_resourcenotenough False true Whether to fall back for resource not enough errors
fallbackForUpgrading fallback_for_upgrading False true Whether to fall back for service upgrading errors
fallbackForRunningTimeout fallback_for_runningtimeout False true Whether to fall back for running timeout errors
fallbackForUnsupportedFeature fallback_for_unsupportedfeature False true Whether to fall back for unsupported feature errors
disableFallback disable_fallback False false Disable all fallback behavior
fallbackQuota fallback_quota False The fallback quota to use
attachTimeout attach_timeout False -1 The timeout for attaching to running jobs, unit depends on specific timeout implementation

Tunnel and Network Settings

URL key Property Key Required Default value Description
tunnelRetryTime tunnel_retry_time False 6 The number of times to retry tunnel operations upon failure
useInstanceTunnel use_instance_tunnel False true Whether to use instance tunnel for reading result sets, not recommend set it to false
tunnelReadTimeout tunnel_read_timeout False -1 Tunnel read timeout in milliseconds, -1 means no timeout
tunnelConnectTimeout tunnel_connect_timeout False -1 Tunnel connection timeout in milliseconds, -1 means no timeout
tunnelDownloadUseSingleReader tunnel_download_use_single_reader False true Whether to use a single reader for tunnel downloads
fetchResultSplitSize fetch_result_split_size False 10000 The size of result splits for fetching
fetchResultPreloadSplitNum fetch_result_preload_split_num False 5 Number of splits to preload when fetching results
fetchResultThreadNum fetch_result_thread_num False 5 Number of threads to use for result fetching

Security and Connection Settings

URL key Property Key Required Default value Description
stsToken sts_token False STS token for temporary security credentials
disableConnectionSetting disable_connection_setting False false Disable connection setting updates
enableCommandApi enable_command_api False false Enable command API functionality
httpsCheck https_check False false Enable HTTPS check for connection security
skipSqlRewrite skip_sql_rewrite False false Skip SQL rewriting and optimization during query processing
skipSqlInjectCheck skip_sql_inject_check False false Skip SQL injection check during query processing
skipCheckIfSelect skipCheckIfSelect False true Skip use antlr4 to check if query is select
quotaName quota_name False Not set The name of the query quota class to use for execution

Timeouts and Networking

URL key Property Key Required Default value Description
readTimeout read_timeout False -1 Read timeout in milliseconds, -1 means no timeout
connectTimeout connect_timeout False -1 Connect timeout in milliseconds, -1 means no timeout
retryTime retry_time False -1 Number of retry attempts for failed operations, -1 means no retries

Schema and Catalog Settings

URL key Property Key Required Default value Description
odpsNamespaceSchema odps_namespace_schema False Judge by project settings Whether to use ODPS namespace schema feature (true/false)
schema schema False default Default schema to use for connection (used with namespace schema mode)
useProjectTimeZone use_project_time_zone False false Use project time zone instead of client time zone
timeZone timezone False JVM default Explicit time zone setting for the connection

Logging and Debugging

URL key Property Key Required Default value Description
enableOdpsLogger enable_odps_logger False false Enable MaxCompute JDBC logger
logLevel log_level False INFO Set the logging level for the driver
verbose verbose False false Enable verbose output for debugging

SQL Settings and Feature Control

URL key Property Key Required Default value Description
enableLimit enable_limit False true Enable result limits for MCQA mode
autoLimitFallback auto_limit_fallback False false Automatically fallback when auto limit is reached
settings settings False Not set A JSON string with additional settings to apply on connection
tableList table_list False Not set CSV list of table names to preload: project.table[,project.table]
logviewVersion logview_version False 1 Set the LogView version to use, 1 or 2
async async False false Enable async execution mode for queries
longJobWarningThreshold long_job_warning_threshold False -1 Threshold in milliseconds for long-running job warnings, -1 disables warning

ODPS Configuration File

URL key Description
odps_config Path to an external ODPS configuration file to load from

Example

JDBC Client Sample Code

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class OdpsJdbcClient {

  private static String driverName = "com.aliyun.odps.jdbc.OdpsDriver";

  /**
   * @param args
   * @throws SQLException
   */
  public static void main(String[] args) throws SQLException {
    try {
      Class.forName(driverName);
    } catch (ClassNotFoundException e) {
      e.printStackTrace();
      System.exit(1);
    }

    // fill in the information here
    String accessId = "your_access_id";
    String accessKey = "your_access_key";
    Connection
        conn =
        DriverManager.getConnection(
            "jdbc:odps:https://service.odps.aliyun.com/api?project=<your_project_name>", accessId,
            accessKey);
    Statement stmt = conn.createStatement();
    String tableName = "testOdpsDriverTable";
    stmt.execute("drop table if exists " + tableName);
    stmt.execute("create table " + tableName + " (key int, value string)");

    String sql;
    ResultSet rs;

    // insert a record
    sql =
        String.format(
            "insert into table %s select 24 key, 'hours' value from (select count(1) from %s) a",
            tableName, tableName);
    System.out.println("Running: " + sql);
    int count = stmt.executeUpdate(sql);
    System.out.println("updated records: " + count);

    // select * query
    sql = "select * from " + tableName;
    System.out.println("Running: " + sql);
    rs = stmt.executeQuery(sql);
    while (rs.next()) {
      System.out.println(String.valueOf(rs.getInt(1)) + "\t" + rs.getString(2));
    }

    // regular query
    sql = "select count(1) from " + tableName;
    System.out.println("Running: " + sql);
    rs = stmt.executeQuery(sql);
    while (rs.next()) {
      System.out.println(rs.getString(1));
    }

    // do not forget to close
    stmt.close();
    conn.close();
  }
}

Setting SQL task properties

stmt.execute("set biz_id=xxxxxx;");
stmt.execute("set odps.sql.mapper.split.size=512;");

Third-party Integration

It is also recommended to use ODPS by using other third-party BI tools or DB visualizer that supports JDBC.

For example:

Getting Involved

The project is under construction (and not fully JDBC-compliant). If you dicover any good features which have not been implemented, please fire me an Email or just pull a request.

Architecture

Build and run unitest

1.Build from source locally:

git clone ....
cd odps-jdbc
mvn package -DskipTests

2.Copy out a configuration file:

cp ./src/test/resources/conf.properties.example ./src/test/resources/conf.properties

3.Fill in your connection strings:

access_id=...
access_key=...
end_point=...
project_name=...
logview_host=...
charset=UTF-8

4.Run maven test command (or just test it in IntelliJ IDEA):

mvn test

Data Type Mapping

Currently, 16 ODPS data types are supported. Please see the following table for supported ODPS data types and corresponding JDBC interfaces.

ODPS Type JDBC Interface JDBC Type
TINYINT java.sql.ResultSet.getByte TINYINT
SMALLINT java.sql.ResultSet.getShort SMALLINT
INT java.sql.ResultSet.getInt INTEGER
BIGINT java.sql.ResultSet.getLong BIGINT
FLOAT java.sql.ResultSet.getFloat FLOAT
DOUBLE java.sql.ResultSet.getDouble DOUBLE
DECIMAL java.sql.ResultSet.getBigDecimal DECIMAL
VARCHAR java.sql.ResultSet.getString VARCHAR
CHAR java.sql.ResultSet.getString CHAR
STRING java.sql.ResultSet.getString VARCHAR
BOOLEAN java.sql.ResultSet.getBoolean BOOLEAN
DATE java.sql.ResultSet.getDate DATE
DATETIME java.sql.ResultSet.getTimestamp TIMESTAMP
TIMESTAMP java.sql.ResultSet.getTimestamp TIMESTAMP
TIMESTAMP_NTZ java.sql.ResultSet.getTimestamp TIMESTAMP
BINARY java.sql.ResultSet.getBytes BINARY
ARRAY java.sql.ResultSet.getArray ARRAY

When the getObject() method is called, what is obtained is the Java type directly corresponding to each ODPS type without any conversion operation. Please see the following table for he correspondence between ODPS types and Java types.

ODPS Type Java Type
TINYINT java.lang.Byte
SMALLINT java.lang.Short
INT java.lang.Integer
BIGINT java.lang.Long
FLOAT java.lang.Float
DOUBLE java.lang.Double
DECIMAL java.math.BigDecimal
VARCHAR com.aliyun.odps.data.Varchar
CHAR com.aliyun.odps.data.Char
STRING byte[]
BOOLEAN java.sql.ResultSet.getBoolean
DATE java.time.LocalDate
DATETIME java.time.ZonedDateTime
TIMESTAMP java.time.Instant
TIMESTAMP_NTZ java.time.LocalDateTime
BINARY com.aliyun.odps.data.Binary
ARRAY java.util.List

NOTE: Possible timezone issue

DATETIME in MaxCompute is actually defined as EPOCH in milliseconds, which is UTC, and so is TIMESTAMP in JDBC. This driver fill the DATETIME value directly into JDBC TIMESTAMP and do no parse or format action. When application that using JDBC display a DATETIME as a human-readable string format, it is the application itself did the format using application defined or OS defined timezone. It is suggested to keep your application/OS timezone setting same to MaxCompute to avoid inconsistent datetime parse/format.

Type Conversion

Implicit type conversion happens when accessing a ODPS data type with JDBC interfaces other than the recommended one. Please see the following table for supported implicit conversions.

JAVA\ODPS TINYINT SMALLINT INT BIGINT FLOAT DOUBLE DECIMAL CHAR VARCHAR STRING DATE DATETIME TIMESTAMP TIMESTAMP_NTZ BOOLEAN BINARY
byte Y Y Y Y Y Y Y Y Y Y Y Y
short Y Y Y Y Y Y Y Y Y Y Y Y
int Y Y Y Y Y Y Y Y Y Y Y Y
long Y Y Y Y Y Y Y Y Y Y Y Y
float Y Y Y Y Y Y Y Y Y Y Y Y
double Y Y Y Y Y Y Y Y Y Y Y Y
BigDecimal Y Y Y Y Y Y Y Y Y Y Y Y Y
String Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
byte[] Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Date Y Y Y Y Y Y
Time Y Y Y Y Y
Timestamp Y Y Y Y Y
boolean Y Y Y Y Y Y Y Y Y

MaxCompute Service Compatibility and Recommended JDBC version

Since Sprint27, MaxCompute tunnel service supported a feature named instance tunnel that allowing client read query result set through tunnel endpoint, to release client from creating temporary table. And this JDBC driver began adopt using instance tunnel since version 2.0.

However, for users using MaxCompute deploy that is earlier than Sprint27 (especially Private Cloud cases), please stick to the latest version before 2.0.

MaxCompute JDBC
Public Service latest
Non PRC Public Service latest
<= Sprint27 1.9.2

License

licensed under the Apache License 2.0

About

JDBC Driver for ODPS

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 11

Languages