Since Vertica can be deployed on Google Cloud Platform, it is possible for the Spark Connector to make use of Google Cloud Storage as the intermediary storage.
- Running on DataProc clusters: If your Spark cluster deployed on GCP, you will need to obtain an HMAC interoperability key. Then configure connector options
gcs_hmac_key_idandgcs_hmac_key_secret. The instruction for obtaining the key can be found here. - Running out-side of DataProc clusters: In addition to configuring the HMAC key above, you will obtain a GCS service account key in the form of a json service keyfile. Instruction on obtaining one can be found here.
Then, specify the connector option
gcs_service_keyfilewith the path to your keyfile JSON. Alternatively, the connector can pick up the option from the environment variableGOOGLE_APPLICATION_CREDENTIALSas well as the spark configuration optionfs.gs.auth.service.account.json.keyfile. Finally, ensure that you include the Google Hadoop Connector dependency into your project. Make sure your select the appropriate connector distribution for your Hadoop version.
With the credential specified, you can now configure the connector option staging_fs_url to use GCS paths gs://<bucket-id>/path/to/data.
Another option to specifying the keyfile path is to set the following connector options:
gcs_service_key_id = < field private_key_id in your keyfile json >
gcs_service_key = < field private_key in your keyfile json >
gcs_service_email = < field client_email in your keyfile json >