Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.¶

Creating table Buckets¶

%%bash
aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket1


aws s3tables create-table-bucket \
    --region us-east-1 \
    --name demo-bucket2

from pyspark.sql import SparkSession
import os
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"


packages = [
    "com.amazonaws:aws-java-sdk-bundle:1.12.661",
    "org.apache.hadoop:hadoop-aws:3.3.4",
    "software.amazon.awssdk:bundle:2.29.38",
    "com.github.ben-manes.caffeine:caffeine:3.1.8",
    "org.apache.commons:commons-configuration2:2.11.0",
    "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
    "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
]

Creating Two Tables in Each Buckets¶

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog1", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog1.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog1.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog1") \
    .config("spark.sql.catalog.catalog1.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog1.demo_poc_bucket1")


spark.sql("""
CREATE TABLE IF NOT EXISTS catalog1.demo_poc_bucket1.customers (
  customer_id INT,
  name STRING,
  email STRING
) USING iceberg
""")
spark.sql("""
INSERT INTO catalog1.demo_poc_bucket1.customers VALUES
  (1, 'John Doe', 'john@example.com'),
  (2, 'Jane Smith', 'jane@example.com'),
  (3, 'Bob Johnson', 'bob@example.com'),
  (4, 'Alice Brown', 'alice@example.com'),
  (5, 'Charlie Davis', 'charlie@example.com')
""")
spark.stop()

:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml

Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 622ms :: artifacts dl 16ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-37ce9917-d6a4-4f76-ab92-33d7fee7545b
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/9ms)
25/01/07 19:50:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

+---------+
|namespace|
+---------+
+---------+

print("pk")

pk

Table Buckets 2¶

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
WAREHOUSE_PATH = "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"

spark = SparkSession.builder \
    .appName("iceberg_lab") \
    .config("spark.jars.packages", ",".join(packages)) \
    .config("spark.sql.catalog.catalog2", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.catalog2.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.catalog2.warehouse", WAREHOUSE_PATH) \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.defaultCatalog", "catalog2") \
    .config("spark.sql.catalog.catalog2.client.region", "us-east-1") \
    .getOrCreate()


spark.sql("SHOW NAMESPACES IN catalog2").show()
spark.sql("CREATE NAMESPACE IF NOT EXISTS catalog2.demo_poc_bucket2")

spark.sql("""
CREATE TABLE IF NOT EXISTS catalog2.demo_poc_bucket2.orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  total_amount DECIMAL(10, 2)
) USING iceberg
""")

spark.sql("""
INSERT INTO catalog2.demo_poc_bucket2.orders VALUES
  (101, 1, DATE '2023-01-15', 150.50),
  (102, 2, DATE '2023-01-16', 200.75),
  (103, 3, DATE '2023-01-17', 75.25),
  (104, 1, DATE '2023-01-18', 300.00),
  (105, 4, DATE '2023-01-19', 125.00),
  (106, 2, DATE '2023-01-20', 180.50),
  (107, 5, DATE '2023-01-21', 95.75)
""")

:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml

Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 579ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-e469dca9-df86-4432-8dbb-c815321de1bc
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/7ms)
25/01/07 19:51:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

+---------+
|namespace|
+---------+
+---------+

DataFrame[]

Join Test Both catalog¶

from pyspark.sql import SparkSession
import os

def create_spark_session(catalogs):
    os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"

    packages = [
        "com.amazonaws:aws-java-sdk-bundle:1.12.661",
        "org.apache.hadoop:hadoop-aws:3.3.4",
        "software.amazon.awssdk:bundle:2.29.38",
        "com.github.ben-manes.caffeine:caffeine:3.1.8",
        "org.apache.commons:commons-configuration2:2.11.0",
        "software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.3",
        "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.1"
    ]

    builder = SparkSession.builder \
        .appName("iceberg_lab") \
        .config("spark.jars.packages", ",".join(packages)) \
        .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")

    for catalog in catalogs:
        catalog_name = catalog['catalog_name']
        builder = builder \
            .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
            .config(f"spark.sql.catalog.{catalog_name}.warehouse", catalog['arn']) \
            .config(f"spark.sql.catalog.{catalog_name}.client.region", "us-east-1")

    builder = builder \
        .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") \
        .config("spark.sql.catalog.spark_catalog.type", "hive") \
        .config("spark.sql.defaultCatalog", "spark_catalog")

    spark = builder.getOrCreate()

    for catalog in catalogs:
        spark.sql(f"SHOW NAMESPACES IN {catalog['catalog_name']}").show()

    return spark

# Example usage:
catalogs = [
    {
        "catalog_name": "catalog1",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket1"
    },
    {
        "catalog_name": "catalog2",
        "arn": "arn:aws:s3tables:us-east-1:<ACCOUNT>:bucket/demo-bucket2"
    }
]

spark = create_spark_session(catalogs)

:: loading settings :: url = jar:file:/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml

Ivy Default Cache set to: /Users/sshah/.ivy2/cache
The jars for the packages stored in: /Users/sshah/.ivy2/jars
com.amazonaws#aws-java-sdk-bundle added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
software.amazon.awssdk#bundle added as a dependency
com.github.ben-manes.caffeine#caffeine added as a dependency
org.apache.commons#commons-configuration2 added as a dependency
software.amazon.s3tables#s3-tables-catalog-for-iceberg added as a dependency
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656;1.0
	confs: [default]
	found com.amazonaws#aws-java-sdk-bundle;1.12.661 in central
	found org.apache.hadoop#hadoop-aws;3.3.4 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
	found software.amazon.awssdk#bundle;2.29.38 in central
	found com.github.ben-manes.caffeine#caffeine;3.1.8 in central
	found org.checkerframework#checker-qual;3.37.0 in central
	found com.google.errorprone#error_prone_annotations;2.21.1 in central
	found org.apache.commons#commons-configuration2;2.11.0 in central
	found org.apache.commons#commons-lang3;3.14.0 in central
	found org.apache.commons#commons-text;1.12.0 in central
	found commons-logging#commons-logging;1.3.2 in central
	found software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 in central
	found org.apache.iceberg#iceberg-api;1.6.1 in central
	found org.slf4j#slf4j-api;1.7.36 in central
	found org.apache.iceberg#iceberg-bundled-guava;1.6.1 in central
	found org.apache.iceberg#iceberg-aws;1.6.1 in central
	found org.apache.iceberg#iceberg-common;1.6.1 in central
	found org.apache.iceberg#iceberg-core;1.6.1 in central
	found org.apache.avro#avro;1.11.3 in central
	found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.14.2 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.14.2 in central
	found org.apache.commons#commons-compress;1.22 in central
	found io.airlift#aircompressor;0.27 in central
	found org.apache.httpcomponents.client5#httpclient5;5.3.1 in central
	found org.apache.httpcomponents.core5#httpcore5;5.2.4 in central
	found org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 in central
	found org.roaringbitmap#RoaringBitmap;1.2.0 in central
	found software.amazon.awssdk#apache-client;2.29.26 in central
	found software.amazon.awssdk#http-client-spi;2.29.26 in central
	found software.amazon.awssdk#annotations;2.29.26 in central
	found software.amazon.awssdk#utils;2.29.26 in central
	found org.reactivestreams#reactive-streams;1.0.4 in central
	found software.amazon.awssdk#metrics-spi;2.29.26 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.16 in central
	found commons-codec#commons-codec;1.17.1 in central
	found software.amazon.awssdk#aws-core;2.29.26 in central
	found software.amazon.awssdk#regions;2.29.26 in central
	found software.amazon.awssdk#sdk-core;2.29.26 in central
	found software.amazon.awssdk#endpoints-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-spi;2.29.26 in central
	found software.amazon.awssdk#identity-spi;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws;2.29.26 in central
	found software.amazon.awssdk#checksums-spi;2.29.26 in central
	found software.amazon.awssdk#checksums;2.29.26 in central
	found software.amazon.awssdk#profiles;2.29.26 in central
	found software.amazon.awssdk#retries-spi;2.29.26 in central
	found software.amazon.awssdk#retries;2.29.26 in central
	found software.amazon.awssdk#json-utils;2.29.26 in central
	found software.amazon.awssdk#third-party-jackson-core;2.29.26 in central
	found software.amazon.awssdk#auth;2.29.26 in central
	found software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 in central
	found software.amazon.eventstream#eventstream;1.0.1 in central
	found software.amazon.awssdk#http-auth;2.29.26 in central
	found software.amazon.awssdk#dynamodb;2.29.26 in central
	found software.amazon.awssdk#aws-json-protocol;2.29.26 in central
	found software.amazon.awssdk#protocol-core;2.29.26 in central
	found software.amazon.awssdk#netty-nio-client;2.29.26 in central
	found io.netty#netty-codec-http;4.1.115.Final in central
	found io.netty#netty-common;4.1.115.Final in central
	found io.netty#netty-buffer;4.1.115.Final in central
	found io.netty#netty-transport;4.1.115.Final in central
	found io.netty#netty-resolver;4.1.115.Final in central
	found io.netty#netty-codec;4.1.115.Final in central
	found io.netty#netty-handler;4.1.115.Final in central
	found io.netty#netty-transport-native-unix-common;4.1.115.Final in central
	found io.netty#netty-codec-http2;4.1.115.Final in central
	found io.netty#netty-transport-classes-epoll;4.1.115.Final in central
	found software.amazon.awssdk#glue;2.29.26 in central
	found software.amazon.awssdk#kms;2.29.26 in central
	found software.amazon.awssdk#s3;2.29.26 in central
	found software.amazon.awssdk#aws-xml-protocol;2.29.26 in central
	found software.amazon.awssdk#aws-query-protocol;2.29.26 in central
	found software.amazon.awssdk#arns;2.29.26 in central
	found software.amazon.awssdk#crt-core;2.29.26 in central
	found software.amazon.awssdk#sts;2.29.26 in central
	found software.amazon.awssdk#url-connection-client;2.29.26 in central
	found software.amazon.awssdk#s3tables;2.29.26 in central
	found org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 in central
:: resolution report :: resolve 581ms :: artifacts dl 12ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.12.661 from central in [default]
	com.fasterxml.jackson.core#jackson-annotations;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.14.2 from central in [default]
	com.github.ben-manes.caffeine#caffeine;3.1.8 from central in [default]
	com.google.errorprone#error_prone_annotations;2.21.1 from central in [default]
	commons-codec#commons-codec;1.17.1 from central in [default]
	commons-logging#commons-logging;1.3.2 from central in [default]
	io.airlift#aircompressor;0.27 from central in [default]
	io.netty#netty-buffer;4.1.115.Final from central in [default]
	io.netty#netty-codec;4.1.115.Final from central in [default]
	io.netty#netty-codec-http;4.1.115.Final from central in [default]
	io.netty#netty-codec-http2;4.1.115.Final from central in [default]
	io.netty#netty-common;4.1.115.Final from central in [default]
	io.netty#netty-handler;4.1.115.Final from central in [default]
	io.netty#netty-resolver;4.1.115.Final from central in [default]
	io.netty#netty-transport;4.1.115.Final from central in [default]
	io.netty#netty-transport-classes-epoll;4.1.115.Final from central in [default]
	io.netty#netty-transport-native-unix-common;4.1.115.Final from central in [default]
	org.apache.avro#avro;1.11.3 from central in [default]
	org.apache.commons#commons-compress;1.22 from central in [default]
	org.apache.commons#commons-configuration2;2.11.0 from central in [default]
	org.apache.commons#commons-lang3;3.14.0 from central in [default]
	org.apache.commons#commons-text;1.12.0 from central in [default]
	org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.16 from central in [default]
	org.apache.httpcomponents.client5#httpclient5;5.3.1 from central in [default]
	org.apache.httpcomponents.core5#httpcore5;5.2.4 from central in [default]
	org.apache.httpcomponents.core5#httpcore5-h2;5.2.4 from central in [default]
	org.apache.iceberg#iceberg-api;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-aws;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-bundled-guava;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-common;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-core;1.6.1 from central in [default]
	org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.6.1 from central in [default]
	org.checkerframework#checker-qual;3.37.0 from central in [default]
	org.reactivestreams#reactive-streams;1.0.4 from central in [default]
	org.roaringbitmap#RoaringBitmap;1.2.0 from central in [default]
	org.slf4j#slf4j-api;1.7.36 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	software.amazon.awssdk#annotations;2.29.26 from central in [default]
	software.amazon.awssdk#apache-client;2.29.26 from central in [default]
	software.amazon.awssdk#arns;2.29.26 from central in [default]
	software.amazon.awssdk#auth;2.29.26 from central in [default]
	software.amazon.awssdk#aws-core;2.29.26 from central in [default]
	software.amazon.awssdk#aws-json-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-query-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#aws-xml-protocol;2.29.26 from central in [default]
	software.amazon.awssdk#bundle;2.29.38 from central in [default]
	software.amazon.awssdk#checksums;2.29.26 from central in [default]
	software.amazon.awssdk#checksums-spi;2.29.26 from central in [default]
	software.amazon.awssdk#crt-core;2.29.26 from central in [default]
	software.amazon.awssdk#dynamodb;2.29.26 from central in [default]
	software.amazon.awssdk#endpoints-spi;2.29.26 from central in [default]
	software.amazon.awssdk#glue;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-aws-eventstream;2.29.26 from central in [default]
	software.amazon.awssdk#http-auth-spi;2.29.26 from central in [default]
	software.amazon.awssdk#http-client-spi;2.29.26 from central in [default]
	software.amazon.awssdk#identity-spi;2.29.26 from central in [default]
	software.amazon.awssdk#json-utils;2.29.26 from central in [default]
	software.amazon.awssdk#kms;2.29.26 from central in [default]
	software.amazon.awssdk#metrics-spi;2.29.26 from central in [default]
	software.amazon.awssdk#netty-nio-client;2.29.26 from central in [default]
	software.amazon.awssdk#profiles;2.29.26 from central in [default]
	software.amazon.awssdk#protocol-core;2.29.26 from central in [default]
	software.amazon.awssdk#regions;2.29.26 from central in [default]
	software.amazon.awssdk#retries;2.29.26 from central in [default]
	software.amazon.awssdk#retries-spi;2.29.26 from central in [default]
	software.amazon.awssdk#s3;2.29.26 from central in [default]
	software.amazon.awssdk#s3tables;2.29.26 from central in [default]
	software.amazon.awssdk#sdk-core;2.29.26 from central in [default]
	software.amazon.awssdk#sts;2.29.26 from central in [default]
	software.amazon.awssdk#third-party-jackson-core;2.29.26 from central in [default]
	software.amazon.awssdk#url-connection-client;2.29.26 from central in [default]
	software.amazon.awssdk#utils;2.29.26 from central in [default]
	software.amazon.eventstream#eventstream;1.0.1 from central in [default]
	software.amazon.s3tables#s3-tables-catalog-for-iceberg;0.1.3 from central in [default]
	:: evicted modules:
	com.amazonaws#aws-java-sdk-bundle;1.12.262 by [com.amazonaws#aws-java-sdk-bundle;1.12.661] in [default]
	com.github.ben-manes.caffeine#caffeine;2.9.3 by [com.github.ben-manes.caffeine#caffeine;3.1.8] in [default]
	commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.3.2] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   83  |   0   |   0   |   3   ||   80  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fef7c7e3-8e33-416c-8563-c92457676656
	confs: [default]
	0 artifacts copied, 80 already retrieved (0kB/6ms)
25/01/07 19:55:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

spark

spark.sql("SHOW NAMESPACES IN catalog1").show()
spark.sql("SHOW NAMESPACES IN catalog2").show()

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket1|
+----------------+

+----------------+
|       namespace|
+----------------+
|demo_poc_bucket2|
+----------------+

result = spark.sql("""
SELECT 
    c.customer_id,
    c.name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount
FROM catalog1.demo_poc_bucket1.customers c
JOIN catalog2.demo_poc_bucket2.orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id, o.order_date
""")

result.show()

[Stage 1:=============================>                             (1 + 1) / 2]

+-----------+-------------+-------------------+--------+----------+------------+
|customer_id|         name|              email|order_id|order_date|total_amount|
+-----------+-------------+-------------------+--------+----------+------------+
|          1|     John Doe|   john@example.com|     101|2023-01-15|      150.50|
|          1|     John Doe|   john@example.com|     104|2023-01-18|      300.00|
|          2|   Jane Smith|   jane@example.com|     102|2023-01-16|      200.75|
|          2|   Jane Smith|   jane@example.com|     106|2023-01-20|      180.50|
|          3|  Bob Johnson|    bob@example.com|     103|2023-01-17|       75.25|
|          4|  Alice Brown|  alice@example.com|     105|2023-01-19|      125.00|
|          5|Charlie Davis|charlie@example.com|     107|2023-01-21|       95.75|
+-----------+-------------+-------------------+--------+----------+------------+

Clean Up¶

%%bash

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --namespace <NAMESPACE_1> \
    --name customers

aws s3tables delete-table \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --namespace <NAMESPACE_2> \
    --name orders

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1" \
    --name <NAMESPACE_1>

aws s3tables delete-namespace \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2" \
    --name <NAMESPACE_2>

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket1"

aws s3tables delete-table-bucket \
    --region us-east-1 \
    --table-bucket-arn "arn:aws:s3tables:us-east-1:<ACCOUNT_ID>:bucket/demo-bucket2"

Pythonist

Tuesday, January 7, 2025

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.¶

Creating table Buckets¶

Creating Two Tables in Each Buckets¶

Table Buckets 2¶

Join Test Both catalog¶

Clean Up¶

No comments:

Post a Comment

Getting started with LakeFS and Apache Iceberg Running Locally