Tuesday, January 7, 2025

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session

test-tble-bucket-joins

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.

Creating table Buckets

In [ ]:
%%bash
aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket1


aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket2
In [2]:
from pyspark.sql import SparkSession
import os
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"


packages = [
    "com.amazonaws:aws-java-sdk-bundle:1.12.661",
    "org.apache.hadoop:hadoop-aws:3.3.4",
    "software.amazon.awssdk:bundle:2.29.38",
    "com.github.ben-manes.caffeine:caffeine:3.1.8",
    "org.apache.commons:commons-configuration2:2.11.0",
    "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
    "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
]

Creating Two Tables in Each Buckets

In [21]:
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog1", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog1.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog1.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog1") \
    .config("spark.sql.catalog.catalog1.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog1.demo_poc_bucket1")


spark.sql("""
CREATE TABLE IF NOT EXISTS catalog1.demo_poc_bucket1.customers (
  customer_id INT,
  name STRING,
  email STRING
) USING iceberg
""")
spark.sql("""
INSERT INTO catalog1.demo_poc_bucket1.customers VALUES
  (1, 'John Doe', 'john@example.com'),
  (2, 'Jane Smith', 'jane@example.com'),
  (3, 'Bob Johnson', 'bob@example.com'),
  (4, 'Alice Brown', 'alice@example.com'),
  (5, 'Charlie Davis', 'charlie@example.com')
""")
spark.stop()
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 622ms :: artifacts dl 16ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/9ms)
25/01/07 19:50:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+---------+
|namespace|
+---------+
+---------+

                                                                                
In [22]:
print("pk")
pk

Table Buckets 2

In [3]:
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog2", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog2.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog2.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog2") \
    .config("spark.sql.catalog.catalog2.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog2").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog2.demo_poc_bucket2")

spark.sql("""
CREATE TABLE IF NOT EXISTS catalog2.demo_poc_bucket2.orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  total_amount DECIMAL(10, 2)
) USING iceberg
""")

spark.sql("""
INSERT INTO catalog2.demo_poc_bucket2.orders VALUES
  (101, 1, DATE '2023-01-15', 150.50),
  (102, 2, DATE '2023-01-16', 200.75),
  (103, 3, DATE '2023-01-17', 75.25),
  (104, 1, DATE '2023-01-18', 300.00),
  (105, 4, DATE '2023-01-19', 125.00),
  (106, 2, DATE '2023-01-20', 180.50),
  (107, 5, DATE '2023-01-21', 95.75)
""")
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 579ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/7ms)
25/01/07 19:51:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+---------+
|namespace|
+---------+
+---------+

                                                                                
Out[3]:
DataFrame[]

Join Test Both catalog

In [1]:
from pyspark.sql import SparkSession
import os

def create_spark_session(catalogs):
    os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"

    packages = [
        "com.amazonaws:aws-java-sdk-bundle:1.12.661",
        "org.apache.hadoop:hadoop-aws:3.3.4",
        "software.amazon.awssdk:bundle:2.29.38",
        "com.github.ben-manes.caffeine:caffeine:3.1.8",
        "org.apache.commons:commons-configuration2:2.11.0",
        "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
        "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
    ]

    builder = SparkSession.builder \
        .appName("iceberg_lab") \
        .config("spark.jars.packages", ",".join(packages)) \
        .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")

    for catalog in catalogs:
        catalog_name = catalog['catalog_name']
        builder = builder \
            .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.warehouse", catalog['arn']) \
            .config(f"spark.sql.catalog.{catalog_name}.client.region", "us-east-1")

    builder = builder \
        .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") \
        .config("spark.sql.catalog.spark_catalog.type", "hive") \
        .config("spark.sql.defaultCatalog", "spark_catalog")

    spark = builder.getOrCreate()

    for catalog in catalogs:
        spark.sql(f"SHOW NAMESPACES IN {catalog['catalog_name']}").show()

    return spark

# Example usage:
catalogs = [
    {
        "catalog_name": "catalog1",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"
    },
    {
        "catalog_name": "catalog2",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"
    }
]

spark = create_spark_session(catalogs)
:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 581ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/6ms)
25/01/07 19:55:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

In [ ]:
spark
In [3]:
spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("SHOW NAMESPACES IN catalog2").show()
+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

In [2]:
result = spark.sql("""
SELECT 
    c.customer_id,
    c.name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount
FROM catalog1.demo_poc_bucket1.customers c
JOIN catalog2.demo_poc_bucket2.orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id, o.order_date
""")

result.show()
[Stage 1:=============================>                             (1 + 1) / 2]
+-----------+-------------+-------------------+--------+----------+------------+
|customer_id|         name|              email|order_id|order_date|total_amount|
+-----------+-------------+-------------------+--------+----------+------------+
|          1|     John Doe|   john@example.com|     101|2023-01-15|      150.50|
|          1|     John Doe|   john@example.com|     104|2023-01-18|      300.00|
|          2|   Jane Smith|   jane@example.com|     102|2023-01-16|      200.75|
|          2|   Jane Smith|   jane@example.com|     106|2023-01-20|      180.50|
|          3|  Bob Johnson|    bob@example.com|     103|2023-01-17|       75.25|
|          4|  Alice Brown|  alice@example.com|     105|2023-01-19|      125.00|
|          5|Charlie Davis|charlie@example.com|     107|2023-01-21|       95.75|
+-----------+-------------+-------------------+--------+----------+------------+

                                                                                

Clean Up

In [3]:
%%bash

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --namespace <NAMESPACE_1> \
    --name customers

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --namespace <NAMESPACE_2> \
    --name orders

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --name <NAMESPACE_1>

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --name <NAMESPACE_2>

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1"

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2"

No comments:

Post a Comment

Learn How to configure your Spark Session to Join Managed (S3 Table Buckets) and Unmanaged Iceberg Tables | Hands on Labs

test-tble-bucket-joins Learn How to configure your Spark Session to Join Managed (S...