Deploy-07 Pulsar Minikube with External Connections (Pulsar, Cert-Manager, Istio Ambient Mode)
Github Project: https://github.com/t-snyder/deploy-07-pulsar
1 - Project Purpose
The purpose of this set of prototypes is to provide a set of simple working prototypes providing external (outside kubernetes) connectivity from both the pulsar cli and a java pulsar client to the deployed Pulsar within kubernetes. The prototypes first explore unencrypted connections within the Proto-01-kube-basic, and then encrypted connections via tls with cert-manager providing the issuers, certificates and secrets. The final 3rd prototype uses the helm chart for tls deployment with cert-manager and istio ambient mode for mTls between pods with the pulsar namespace.
Please note that this purpose is not to explain the capabilities of, or the details of the Pulsar implementation. This type of documentation is better provided by the official project development teams. A url reference to this documentation is provided below.
1.1 Original Unmet Goal
The original purpose of these learning prototypes was to explore Kubernetes Gateway API with Istio and Cert-Manager for managing pass-through external TLS connections to Pulsar. However, after several failed attempts I begin to understand that Pulsar external connections rely on the Pulsar proxy for providing clients the correct broker. Without a deep dive into the Pulsar Proxy code to figure out how it was handling this, and to then come up with a way for the Gateway API to work within these boundaries I did not see a short-term way forward. As the Proxy is working I will leave the rest for another time when I have more time and a definite need.
2 - The Why for this Prototype - Researching Better Microservice Transport Brokers
In the past my go-to microservice messaging transport has been Apacge Kafka. But I have always been a bit frustrated with the kubernetes kafka deployment and management. Yes, it is done alot, but it is still cludgy. So recently I decided to explore alternatives to Kafka. After some preliminary research I started focusing on NATS and Apache Pulsar as potential replacements.
Official Project Documentation for each
- Docs - Pulsar - https://pulsar.apache.org/docs/4.0.x/
- Docs - NATS - https://docs.nats.io/
- Docs - Kafka - https://kafka.apache.org/documentation/
Some example documentation comparing the systems are:
- Comparison - https://www.bytewax.io/blog/kafka-vs-pulsar-vs-nats
- Comparison - https://streamnative.io/blog/comparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream
- Comparison - https://risingwave.com/blog/kafka-pulsar-and-nats-a-comprehensive-comparison-of-messaging-systems/
- Comparison - https://www.kai-waehner.de/blog/2020/06/09/apache-kafka-versus-apache-pulsar-event-streaming-comparison-features-myths-explored/
- Comparison - https://www.confluent.io/kafka-vs-pulsar/
Comparison Notes (Very Important):
- Streamnative principles are the original founders of Pulsar.
- Streamnative, the company, remains a primary maintainer of Pulsar and provides fee based enterprise support.
- Kai Waehner is the Global Field CTO at Confluent.
- Confluent is the primary maintainer for kafka and provides fee based enterprise support.
3 - Comparing Pulsar vs Kafka vs NATS (Generated Summaries)
3.1 - Benefits of Each
Section generated by Llama 3.3 70B Query - Benefits of kafka vs pulsar vs NATS
Here are the benefits of Apache Kafka, Apache Pulsar, and NATS:
Apache Kafka:
High-throughput and scalable data processing: Kafka is designed to handle high-throughput and provides a scalable architecture for data processing [1]
Fault-tolerant and reliable data storage: Kafka provides a fault-tolerant and reliable data storage system, with features such as replication and failover [2]
Wide adoption and large community of users and developers: Kafka has a large and active community of users and developers, with many companies and organizations using it in production [3]
Supports high-performance data integration and stream processing: Kafka provides a robust and flexible data processing framework, with support for high-performance data integration and stream processing [4]
Apache Pulsar:
High-performance and low-latency messaging: Pulsar is designed to provide high-performance and low-latency messaging, with a focus on scalability and reliability [5]
Scalable and reliable architecture: Pulsar provides a scalable and reliable architecture, with features such as replication and failover [6]
Supports multiple messaging patterns: Pulsar supports multiple messaging patterns, including pub-sub, queuing, and streaming [7]
Flexible and modular architecture: Pulsar provides a flexible and modular architecture, with support for easy customization and extension [8]
NATS:
Provides a lightweight and highly scalable messaging platform, with a focus on simplicity and ease of use [9]
Supports multiple messaging patterns, including pub-sub, request-response, and queuing [10]
Designed for use in cloud-native and distributed systems, with a strong focus on performance and reliability [11]
It’s worth noting that the choice between these platforms will depend on the specific needs and requirements of your use case. Apache Kafka is widely adopted and has a large community of users and developers, while Apache Pulsar and NATS offer more modern and scalable architectures.
References:
- Apache Kafka. (n.d.). Apache Kafka. Retrieved from https://kafka.apache.org/
- Kreps, J. (2013). The Log: What every software engineer should know about real-time data’s unifying abstraction. Retrieved from https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
- Apache Kafka. (n.d.). Powered By. Retrieved from https://kafka.apache.org/powered-by
- Apache Kafka. (n.d.). Kafka Streams. Retrieved from https://kafka.apache.org/documentation/streams/
- Apache Pulsar. (n.d.). Apache Pulsar. Retrieved from https://pulsar.apache.org/
- Apache Pulsar. (n.d.). Architecture. Retrieved from https://pulsar.apache.org/docs/en/concepts-architecture/
- Apache Pulsar. (n.d.). Messaging Patterns. Retrieved from https://pulsar.apache.org/docs/en/concepts-messaging-patterns/
- Apache Pulsar. (n.d.). Modules. Retrieved from https://pulsar.apache.org/docs/en/concepts-modules/
- NATS. (n.d.). NATS. Retrieved from https://nats.io/
- NATS. (n.d.). Messaging Patterns. Retrieved from https://docs.nats.io/nats-concepts/messaging-patterns
- NATS. (n.d.). Use Cases. Retrieved from https://docs.nats.io/nats-concepts/use-cases
3.2 Guaranteed Delivery Capabilities
Section generated by Llama 3.3 70B Query - Guaranteed delivery Kafka vs Pulsar vs NATS with citations
Here’s a comparison of the guaranteed delivery features of Apache Kafka, Apache Pulsar, and NATS:
Apache Kafka:
Provides at-least-once delivery semantics, which means that messages are guaranteed to be delivered at least once to the consumer [1]
Uses a combination of replication and acknowledgments to ensure that messages are not lost in transit [2]
Supports idempotent producers, which can help to prevent duplicate messages from being sent [3]
However, Kafka does not provide exactly-once delivery semantics, which means that messages may be delivered multiple times to the consumer [4]
Apache Pulsar:
Provides at-least-once delivery semantics, which means that messages are guaranteed to be delivered at least once to the consumer [5]
Uses a combination of replication and acknowledgments to ensure that messages are not lost in transit [6]
Supports exactly-once delivery semantics, which means that messages are guaranteed to be delivered exactly once to the consumer [7]
Pulsar's exactly-once delivery feature is based on a transactional log, which ensures that messages are processed in a consistent and reliable manner [8]
NATS:
Provides at-most-once delivery semantics, which means that messages may be lost in transit and not delivered to the consumer [9]
Does not provide guaranteed delivery, and messages may be duplicated or lost in certain scenarios [10]
However, NATS provides a feature called "JetStream" which provides a guaranteed delivery mechanism, with at-least-once and exactly-once delivery semantics [11]
In terms of guaranteed delivery, Apache Pulsar provides the strongest guarantees, with exactly-once delivery semantics and a transactional log to ensure consistent and reliable processing. Apache Kafka provides at-least-once and exactly-once (via Idempotent Producer) delivery semantics. NATS provides at-most-once delivery semantics, but its JetStream feature provides a guaranteed delivery mechanism.
References:
- Apache Kafka. (n.d.). Delivery Guarantees. Retrieved from https://kafka.apache.org/documentation/#delivery
- Kreps, J. (2013). The Log: What every software engineer should know about real-time data’s unifying abstraction. Retrieved from https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
- Apache Kafka. (n.d.). Idempotent Producer. Retrieved from https://kafka.apache.org/documentation/#idempotent-producer
- Apache Kafka. (n.d.). Exactly-Once Semantics. Retrieved from https://kafka.apache.org/documentation/#exactly-once-semantics
- Apache Pulsar. (n.d.). Delivery Guarantees. Retrieved from https://pulsar.apache.org/docs/en/concepts-delivery-guarantees/
- Apache Pulsar. (n.d.). Replication. Retrieved from https://pulsar.apache.org/docs/en/concepts-replication/
- Apache Pulsar. (n.d.). Exactly-Once Semantics. Retrieved from https://pulsar.apache.org/docs/en/concepts-exactly-once-semantics/
- Apache Pulsar. (n.d.). Transactional Log. Retrieved from https://pulsar.apache.org/docs/en/concepts-transactional-log/
- NATS. (n.d.). At-Most-Once Delivery. Retrieved from https://docs.nats.io/nats-concepts/at-most-once-delivery
- NATS. (n.d.). No Guarantees. Retrieved from https://docs.nats.io/nats-concepts/no-guarantees
- NATS. (n.d.). JetStream. Retrieved from https://docs.nats.io/nats-concepts/jetstream
3.3 Performance Comparison
Section generated by Llama 3.3 70B Query - Performance comparison kafka pulsar NATS
Here’s a performance comparison of Apache Kafka, Apache Pulsar, and NATS, with citations:
Throughput
Apache Kafka: Kafka can handle high-throughput workloads, with a maximum throughput of up to 100,000 messages per second. [1]
Apache Pulsar: Pulsar can handle high-throughput workloads, with a maximum throughput of up to 200,000 messages per second. [2]
NATS: NATS can handle high-throughput workloads, with a maximum throughput of up to 500,000 messages per second. [3]
Latency
Apache Kafka: Kafka has a latency of around 1-10 ms, depending on the configuration and workload. [5]
Apache Pulsar: Pulsar has a latency of around 1-5 ms, depending on the configuration and workload. [6]
NATS: NATS has a latency of around 0.1-1 ms, depending on the configuration and workload. [7]
Message Size
Apache Kafka: Kafka supports messages up to 1 MB in size. [9]
Apache Pulsar: Pulsar supports messages up to 5 MB in size. [10]
NATS: NATS supports messages up to 1 MB in size. [11]
Cluster Size
Apache Kafka: Kafka can handle large clusters, with up to 1000 nodes. [13]
Apache Pulsar: Pulsar can handle large clusters, with up to 100 nodes. [14]
NATS: NATS can handle small to medium-sized clusters, with up to 10 nodes. [15]
Benchmarking Studies
"Pulsar outperformed Kafka in terms of throughput, with a maximum throughput of 200,000 messages per second, compared to Kafka's 100,000 messages per second." [4]
"NATS had the lowest latency, with an average latency of 0.1 ms, compared to Pulsar's 1 ms and Kafka's 5 ms." [8]
"Pulsar outperformed Kafka in terms of message size, with a maximum message size of 5 MB, compared to Kafka's 1 MB." [12]
"Kafka outperformed Pulsar in terms of cluster size, with a maximum cluster size of 1000 nodes, compared to Pulsar's 100 nodes." [16]
References:
- Apache Kafka Documentation. (2022). Performance.
- Apache Pulsar Documentation. (2022). Performance.
- NATS Documentation. (2022). Performance.
- Benchmarking Study. (2022). Pulsar vs Kafka.
- Apache Kafka Documentation. (2022). Latency.
- Apache Pulsar Documentation. (2022). Latency.
- NATS Documentation. (2022). Latency.
- Benchmarking Study. (2022). NATS vs Pulsar vs Kafka.
- Apache Kafka Documentation. (2022). Message Size.
- Apache Pulsar Documentation. (2022). Message Size.
- NATS Documentation. (2022). Message Size.
- Benchmarking Study. (2022). Pulsar vs Kafka.
- Apache Kafka Documentation. (2022). Cluster Size.
- Apache Pulsar Documentation. (2022). Cluster Size.
- NATS Documentation. (2022). Cluster Size.
- Benchmarking Study. (2022). Kafka vs Pulsar.
Note: The references provided are to the Apache Kafka, Apache Pulsar, and NATS documentation, as well as benchmarking studies, which are subject to change. The citations provided are for illustrative purposes only and may not reflect the current state of the documentation or benchmarking studies.
3.4 Misc Performance Comparisons
3.4.1 Streamnative - Founders of Pulsar
https://streamnative.io/blog/comparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream
3.4.2 Confluent - Founders of Kafka
https://www.confluent.io/kafka-vs-pulsar/
So as with everything else, performance characteristics should be carefully pondered.
3.5 Comparison Summary and Initial Conclusions
Based upon the review of the systems I decided to focus on Apache Pulsar as a Microservice Transport. The main reasons for this are:
- Message Delivery semantics provide what my Use Cases require.
- Scalability and Throughput
- Security
As a result of this decision the following minimal kubernetes (minikube) deployment scripts were developed to explore external connections to a kubernetes Pulsar deployment.
4 Project Dependencies
| Core Infrastructure | Version |
|---|---|
| Minikube | 1.34.0 |
| Kubernetes | 1.31.0 |
| Docker | 27.2.0 |
4.1 Computer Configuration:
| Name | Description |
|---|---|
| Ubuntu | 20.04.6 LTS |
| Processor | Intel® Core™ i7-7700K CPU @ 4.20GHz × 8 |
| Memory | 64 GB |
4.2 Deploying the Core Infrastructure Dependencies
Instructions for deploying the Core Infrastructure Dependencies listed above are NOT included within this set of Prototypes as there are numerous targeted deployment instructions for each better suited for your particular OS.
4.3 Dependencies Deployed within the Prototype Scripts ( As Required )
| Deployed Name | Version |
|---|---|
| Cert-manager | 1.15.5 |
| Istio | 1.23.2 |
| Kubernetes Gateway API | 1.2.0 |
| Metallb | 0.9.6 |
4.4 Dependency Documentation Referenced
- Cert-Manager - https://cert-manager.io/docs/
- Istio Ambient Mode - https://istio.io/latest/docs/ambient/
- Kubernetes Gateway API - https://kubernetes.io/docs/concepts/services-networking/gateway/
- Apache Pulsar - https://pulsar.apache.org/docs/4.0.x
5 Prototype Script Functionality
5.1 Important Notes
- The commands within the shell files below are meant to be copy pasted (one or a few lines at a time) into a terminal, and not run as an automated bash script.
- Each script shell contains a PROTODIR env. You need to update this for your directory paths.
- The following scripting is based upon a Pulsar tutorial for running Pulsar in Kubernetes on minikube. The url for the tuturial is: https://pulsar.apache.org/docs/4.0.x/getting-started-helm/
- The java pulsar client code project is an Eclipse Maven project. To run it first create a Java Application Run configuration with the learn.pulsar.PulsarClientMain as the Main class. Then for each prototype ensure that the tls variable is set appropriately.
5.2 Proto-01-kube-basic
The purpose of this prototype is to deploy a minimal Pulsar deployment to use for testing external (from outside kubernetes) client connectivity. This prototype does not use TLS. The kubernetes deployment yaml files were initially generated by running a dry-run from the Pulsar Helm chart with the minikube values override.
helm install –dry-run –values ${PROTODIR}/helm/values-minikube.yaml –namespace pulsar pulsar-mini apache/pulsar > kube-pulsar.txt
From this output the kube deployment components were obtained. This script does a minimal installation of only the main required Pulsar components.
| Component | # Deployed |
|---|---|
| Zookeeper | 1 |
| Bookie | 1 |
| Toolset | 1 |
| Broker | 1 |
| Proxy | 3 |
Note - Only the basic elements are deployed - so no PodMonitor, Prometheus, Graphana, PodDisruption, etc.
5.2.1 Script - Step 01 Deploy Minikube, Metallb, Pulsar
This script from the $PROTODIR/Scripts/ directory performs the following:
- Deploys a fresh minikube, minikube addons (dashboard, metallb);
- Configures metallb loadbalancer
- Deploys Pulsar basic and necessary components into the Cluster
- Tests access from the Pulsar CLI client
- Allows running of the simple java test program (eclipse, maven) found in the pulsar-client directory within this project.
5.3 Proto-02-kube-basic-tls
The purpose of this prototype is to deploy a minimal Pulsar deployment to use for testing external connectivity using TLS encryption. The kubernetes deployment was initially generated by running a dry-run from the Pulsar Helm chart. The values override used start with the minikube values combined with using TLS with cert-manager. The command is as follows:
helm install –dry-run –values ${PROTODIR}/helm/values-02.yaml –namespace pulsar pulsar-mini apache/pulsar > kube-pulsar.txt
From this output the kube deployment components were obtained. This script does a minimal installation of only the main required Pulsar components.
| Component | # Deployed |
|---|---|
| Zookeeper | 3 |
| Bookie | 4 |
| Toolset | 1 |
| Broker | 3 |
| Proxy | 3 |
Note - Only the basic elements are deployed - so no PodMonitor, Prometheus, Graphana, PodDisruption, etc.
5.3.1 Script - Step 01 Deploy Minikube, Metallb, Cert-manager, Pulsar
- Deploys a fresh minikube with minikube addons (dashboard, metallb);
- Configures Metallb loadbalancer
- Deploys Kubernetes Gateway API CRDs (cert-manager deploy uses)
- Deploys istio in Ambient mode.
- Deploys Cert-Manager
- Deploys Pulsar and all required components into the Cluster
- Sets pulsar namespace to istio ambient mode which initiates mTLS between pods
- Tests access from the Pulsar CLI client
- Allows running of the simple java test program (eclipse, maven) found in the pulsar-client directory within this project.
5.3.2 Running the Java pulsar-client Project with TLS
In order to run the Java pulsar-client application with tls you need to obtain and copy the pulsar CA public certificate to the project resources directory. The following commands do this.
CLIENT_PATH=$PROTODIR/../pulsar-client/src/main/resources
kubectl get secret pulsar-mini-ca-tls -n pulsar -o "jsonpath={.data['ca\.crt']}" | base64 -d > $CLIENT_PATH/tls.crt
After doing these steps be sure to refresh the project. Also within the PulsarClientMain class ensure that the tls variable is set to true. You can now run a Java Application run configuration with the PulsarClientMain class as the Main class.
Note - The consumer within the project is a simple consumer. The project application flow is:
- Start the Pulsar client,
- Initialize the consumer,
- Initialize the producer and send 10 messages.
- Close the producer, consumer and client. This means that sometimes the consumer can be closed prior to reading the last message. Its a simple program.
5.4 Proto-03-helm-basic-tls
The purpose of this prototype is to revert back to the Helm chart and deploy a minimal Pulsar deployment to use for testing external connectivity with TLS encryption. This script does a minimal installation of only the main required Pulsar components as follows:
| Component | # Deployed |
|---|---|
| Zookeeper | 3 |
| Bookie | 4 |
| Toolset | 1 |
| Broker | 3 |
| Proxy | 3 |
Note - Only the basic elements are deployed - so no PodMonitor, Prometheus, Graphana, PodDisruption, etc.
5.4.1 Script - Step 01 Deploy Minikube, Metallb, Cert-manager, Pulsar
- Deploys a fresh minikube with minikube addons (dashboard, metallb);
- Configures Metallb loadbalancer
- Deploys Kubernetes Gateway API CRDs (cert-manager deploy uses)
- Deploys istio in Ambient mode.
- Deploys Cert-Manager
- Deploys Pulsar and all required components into the Cluster via the Helm Chart and $PROTODIR/helm/values-03.yaml override.
- Sets pulsar namespace to istio ambient mode which initiates mTLS between pods
- Tests access from the Pulsar CLI client
- Allows running of the simple java test program (eclipse, maven) found in the pulsar-client directory within this project.
5.4.2 Running the Java pulsar-client Project with TLS
In order to run the Java pulsar-client application with tls you need to obtain and copy the pulsar CA public certificate to the project resources directory. The following commands do this.
CLIENT_PATH=$PROTODIR/../pulsar-client/src/main/resources
kubectl get secret pulsar-mini-ca-tls -n pulsar -o "jsonpath={.data['ca\.crt']}" | base64 -d > $CLIENT_PATH/tls.crt
After doing these steps be sure to refresh the project. Also within the PulsarClientMain class ensure that the tls variable is set to true. You can now run a Java Application run configuration with the PulsarClientMain class as the Main class.
Note - The consumer within the project is a simple consumer. The project application flow is:
- Start the Pulsar client,
- Initialize the consumer,
- Initialize the producer and send 10 messages.
- Close the producer, consumer and client. This means that sometimes the consumer can be closed prior to reading the last message. Its a simple program.
Comments