Tuesday, January 7, 2025

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session

test-tble-bucket-joins

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.

Creating table Buckets

In [ ]:
%%bash
aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket1


aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket2
In [2]:
from pyspark.sql import SparkSession
import os
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"


packages = [
    "com.amazonaws:aws-java-sdk-bundle:1.12.661",
    "org.apache.hadoop:hadoop-aws:3.3.4",
    "software.amazon.awssdk:bundle:2.29.38",
    "com.github.ben-manes.caffeine:caffeine:3.1.8",
    "org.apache.commons:commons-configuration2:2.11.0",
    "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
    "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
]

Creating Two Tables in Each Buckets

In [21]:
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog1", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog1.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog1.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog1") \
    .config("spark.sql.catalog.catalog1.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog1.demo_poc_bucket1")


spark.sql("""
CREATE TABLE IF NOT EXISTS catalog1.demo_poc_bucket1.customers (
  customer_id INT,
  name STRING,
  email STRING
) USING iceberg
""")
spark.sql("""
INSERT INTO catalog1.demo_poc_bucket1.customers VALUES
  (1, 'John Doe', 'john@example.com'),
  (2, 'Jane Smith', 'jane@example.com'),
  (3, 'Bob Johnson', 'bob@example.com'),
  (4, 'Alice Brown', 'alice@example.com'),
  (5, 'Charlie Davis', 'charlie@example.com')
""")
spark.stop()
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 622ms :: artifacts dl 16ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/9ms)
25/01/07 19:50:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+---------+
|namespace|
+---------+
+---------+

                                                                                
In [22]:
print("pk")
pk

Table Buckets 2

In [3]:
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog2", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog2.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog2.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog2") \
    .config("spark.sql.catalog.catalog2.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog2").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog2.demo_poc_bucket2")

spark.sql("""
CREATE TABLE IF NOT EXISTS catalog2.demo_poc_bucket2.orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  total_amount DECIMAL(10, 2)
) USING iceberg
""")

spark.sql("""
INSERT INTO catalog2.demo_poc_bucket2.orders VALUES
  (101, 1, DATE '2023-01-15', 150.50),
  (102, 2, DATE '2023-01-16', 200.75),
  (103, 3, DATE '2023-01-17', 75.25),
  (104, 1, DATE '2023-01-18', 300.00),
  (105, 4, DATE '2023-01-19', 125.00),
  (106, 2, DATE '2023-01-20', 180.50),
  (107, 5, DATE '2023-01-21', 95.75)
""")
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 579ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/7ms)
25/01/07 19:51:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+---------+
|namespace|
+---------+
+---------+

                                                                                
Out[3]:
DataFrame[]

Join Test Both catalog

In [1]:
from pyspark.sql import SparkSession
import os

def create_spark_session(catalogs):
    os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"

    packages = [
        "com.amazonaws:aws-java-sdk-bundle:1.12.661",
        "org.apache.hadoop:hadoop-aws:3.3.4",
        "software.amazon.awssdk:bundle:2.29.38",
        "com.github.ben-manes.caffeine:caffeine:3.1.8",
        "org.apache.commons:commons-configuration2:2.11.0",
        "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
        "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
    ]

    builder = SparkSession.builder \
        .appName("iceberg_lab") \
        .config("spark.jars.packages", ",".join(packages)) \
        .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")

    for catalog in catalogs:
        catalog_name = catalog['catalog_name']
        builder = builder \
            .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.warehouse", catalog['arn']) \
            .config(f"spark.sql.catalog.{catalog_name}.client.region", "us-east-1")

    builder = builder \
        .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") \
        .config("spark.sql.catalog.spark_catalog.type", "hive") \
        .config("spark.sql.defaultCatalog", "spark_catalog")

    spark = builder.getOrCreate()

    for catalog in catalogs:
        spark.sql(f"SHOW NAMESPACES IN {catalog['catalog_name']}").show()

    return spark

# Example usage:
catalogs = [
    {
        "catalog_name": "catalog1",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"
    },
    {
        "catalog_name": "catalog2",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"
    }
]

spark = create_spark_session(catalogs)
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 581ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/6ms)
25/01/07 19:55:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

In [ ]:
spark
In [3]:
spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("SHOW NAMESPACES IN catalog2").show()
+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

In [2]:
result = spark.sql("""
SELECT 
    c.customer_id,
    c.name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount
FROM catalog1.demo_poc_bucket1.customers c
JOIN catalog2.demo_poc_bucket2.orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id, o.order_date
""")

result.show()
[Stage 1:=============================>                             (1 + 1) / 2]
+-----------+-------------+-------------------+--------+----------+------------+
|customer_id|         name|              email|order_id|order_date|total_amount|
+-----------+-------------+-------------------+--------+----------+------------+
|          1|     John Doe|   john@example.com|     101|2023-01-15|      150.50|
|          1|     John Doe|   john@example.com|     104|2023-01-18|      300.00|
|          2|   Jane Smith|   jane@example.com|     102|2023-01-16|      200.75|
|          2|   Jane Smith|   jane@example.com|     106|2023-01-20|      180.50|
|          3|  Bob Johnson|    bob@example.com|     103|2023-01-17|       75.25|
|          4|  Alice Brown|  alice@example.com|     105|2023-01-19|      125.00|
|          5|Charlie Davis|charlie@example.com|     107|2023-01-21|       95.75|
+-----------+-------------+-------------------+--------+----------+------------+

                                                                                

Clean Up

In [3]:
%%bash

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --namespace <NAMESPACE_1> \
    --name customers

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --namespace <NAMESPACE_2> \
    --name orders

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --name <NAMESPACE_1>

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --name <NAMESPACE_2>

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1"

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2"

No comments:

Post a Comment

Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST endpoint

gluecat Learn How to Connect to the Glue Data Catalog using AWS Glue Iceberg REST e...