Featured

Create diagram as code in Python

In the previous post, we explored my custom ClickHouse backup agent, built upon the clickhouse-backup tool, logrotate, Cron and Bash scripts. I have also shared all the necessary resources for testing the agent on your local machine using Docker as well as Docker Compose or deploying it in a production environment. Let’s update the agent’s repo with some Python code.

You may be familiar with a main GitOps principle: use Git as the single source of truth; store your applications and infrastructure configurations in a Git repository along with application code. Kubernetes (yaml), Terraform (tf), Docker, Compose files, Jenkinsfile and even diagrams can be good examples of files kept in such repositories. But how to represent diagrams? As png, vsd or jpeg? Let’s pretend we’re developers and can draw diagrams using code.

The diagrams project brings this approach to life. I opted for Diagrams (mingrammer) because it’s free and built on Python and Graphviz, widely used language and tool that enable you to create various diagrams, whether it’s a flowchart or a cloud architecture. Another advantage is that the project is actively maintained and continuously developed. You can also check out other tools such as pyflowchart, mermaid, plantuml or terrastruct.

Let’s get started and draw a flowchart for the clickhouse backup agent using Diagrams (mingrammer). First, install Python (>3.7; mine is 3.11) and Graphviz (9.0.0, Windows in my env), then install diagrams module (0.23.4).

Diagrams include the following objects: node (=shapes; programming, azure, custom and others), edge (=connection lines; linkage between nodes), cluster (=group of isolated nodes) and diagram (represents your entire chart). Each object has it’s own attributes. Description of all attributes can be found at Graphviz docs. Also, check out basic examples to understand what we gonna “build”. I won’t describe every attribute. DYOR.

The first line of your code might look like this:

# import required modules
from diagrams import Diagram, Edge, Cluster, Node

Then we define attributes for each object (excerpt):

# define attributes for graphviz components
graph_attributes = {
    "fontsize": "9",
    "orientation": "portrait",
    "splines":"spline"
}

Next, we need to describe diagram object and it’s attributes (excerpt):

with Diagram(show=False, outformat="png", graph_attr=graph_attributes, direction="TB"):
    # nodes and icons
    start_end_icon = "diagram/custom-images/start-end.png"
    start = Node (label="Start", image=start_end_icon, labelloc="c", height="0.4", weight="0.45", **node_attributes)

I use general Node class with custom images which were taken from programming nodes and then optimized to my flowchart (I’ve deleted canvas and resized images). You could safely use diagrams.programming.flowchart node class instead, but be ready to play with height/width node’s attributes. Another way to add your own images as nodes is Custom node class.

We have described icons and shared nodes. Now we need to add the first group of nodes to represent the main process of the agent and flowchart (creating and uploading FULL backups):

# cluster/full backup
    with Cluster("main", graph_attr=graph_attributes):
       diff_or_full = Node (label="TYPE?", image=decision_icon, height="0.7", weight="", labelloc="c", **node_attributes )

Subroutine processes (diff backups and etc.) are clusters (excerpt):

# cluster/diff backup
    with Cluster("diff", graph_attr=graph_attributes):
      create_diff_backup = Node (label="Create DIFF", labelloc="c", height="0.5", weight="4", image=action_icon, **node_attributes)

Edges or connections between nodes are defined at the bottom (excerpt):

# Log connections
    diff_or_full - Edge(label="\n\n wrong type", tailport="e", headport="n", **edge_attributes ) - write_error 

As a result, I’ve updated the repo with diagram as code; slightly modified GitHub actions by adding a new step to “draw” diagram and check python code. When I push new commits to the repo, the diagram is created and published as an artifact with nodes (start, end, condition, action, catch, input/output), four clusters (main, diff, log, upload log) and edges between nodes.

Looks pretty good, doesn’t it?

Featured

Simple ClickHouse backup agent

Learn how to backup and restore ClickHouse data

In today’s data-driven world, managing and protecting your data is paramount. We have already learned how to backup PostgreSQL here. And now it’s our turn to discuss ClickHouse, an open-source columnar database management system, that is renowned for it’s high performance and scalability.

I will explain a Dockerized cron job designed to facilitate the backup of ClickHouse databases, whether they are located on a single host or within a clustered environment with shards and replicas.

The agent is based on the Alpine Docker image and uses the clickhouse-backup tool, which has the ability to work as a REST API service. Logrotate has been added to manage the log files produced by the backup agent.

The agent does the following:

  • creates scheduled FULL or DIFF backups (POST to /backup/create)
  • checks “create backup” action status before every upload (GET to /backup/status)
  • uploads each backup to a remote storage (POST to /backup/upload/)
  • checks and waits until upload operation finishes (GET to /backup/actions)
  • manages log file with API responses and errors
  • generates customized output to standard container logs
  • if a backup is not uploaded to remote storage, it’s marked as failed and will not be used as the last backup for subsequent DIFF backups

Important: according to the clickhouse-backup official FAQ, “incremental backups calculate increment only during executing upload or create_remote command or similar REST API request”. In other words, DIFF and FULL local backups are actually the same (clickhouse-backup list local). Clickhouse-backup creates local backups first before uploading them to remote storage.

If you list remote backups using the command (clickhouse-backup list remote), you will notice the distinction between these two backup types. This is why the agent only issues a warning when you attempt to create a DIFF backup for the first time without having any prior FULL backups

Default settings:

  • DIFF backups: every hour from Monday through Friday and Sunday, plus every hour from 0 through 20 on Saturday
  • FULL backups: every Saturday at 8.30 PM
  • Rotate and compess logs weekly, rotated 14 times before being removed
  • Clickhouse-backup API basic authentication is enabled (rlAPIuser)
  • Clickhouse server authentication is enabled (rlbackup)
  • Remote storage is ftp with authentication enabled
  • Backups to keep local: 6
  • Backups to keep remote: 336

Content

  • docker-compose.yml – describes environment to test the agent locally.
    There are the following services:
    – clickhouse server (clickhouse-server:23.8-alpine)
    – clickhouse-backup (altinity/clickhouse-backup:2.4.0)
    – our clickhouse-backup-agent (ch-backup-agent)
    – ftpd_server (stilliard/pure-ftpd)
  • ./clickhouse/clickhouse-backup-config.yml – clickhouse-backup config file
  • ./agent/Dockerfile – backup agent’s docker image
  • ./agent/ch-backup-logrotate.conf – logrotate config file
  • ./agent/clickhouse-backup.sh – script to define backup and upload steps
  • ./agent/cronfile – cron job backup and logrotate tasks
  • ./github/workflows/docker-image.yml – simple GitHub action to build agent’s docker image on every Dockerfile change

Possible use cases

  • As a resource for learning Docker, Docker Compose, Bash, cron, and logrotate.
  • As a source of scripts, cron job tasks, or Docker files.
    Just grab them and you are all set.
  • As a sample of pairing ClickHouse-Backup and ClickHouse Server.

How to use

  • check out logrotate and cron settings in the agent folder
  • verify the Dockerfile in the agent folder (if docker is being used)
  • adjust clickhouse backup settings if necessary (./clickhouse/clickhouse-backup-config.yml) Change credentials, clickhouse host and remote storage at least
  • clickhouse-backup API container or standalone service shoud have access to /var/clickhouse/ folders to create backup successfully. In case of a container, see docker-compose.yml. If your clickhouse-backup API is a Linux service, run the service on the first replica for each shard, and then update cronfile accordingly.
  • copy cron and script files to a remote host, and then make a test run
  • in the case of using Docker, please check the docker-compose.yml file and remove any unnecessary services (such as clickhouse and ftp). Afterward, run docker-compose up -d –build to get containers started
  • use docker logs or docker compose logs to check service logs.
    Log files are also located under the /var/log/clickhouse-backup/ folder

Since the ch-backup-agent image already comes with the clickhouse-backup application pre-installed, you have the option to run both the API and scripts within the agent’s container. To achieve this, you should create a volume mapping for the ./clickhouse/clickhouse-backup-config.yml file and then start the clickhouse-backup API server as a service.

Some excerpts

Function to create full backup:

function create_full_backup {

    # CREATE
    curl -u "$API_USERNAME:$API_PASSWORD" -s -X POST \
          http://"${SERVER_ADDRESS}":7171/backup/create?name="${FULL_BACKUP_NAME}" | jq . >> "$BACKUP_LOG"
    BACKUP_NAME="${FULL_BACKUP_NAME}"
    check_backup_status

    # UPLOAD
    check_upload_status
}

Function to check upload status

function check_upload_status {

        # Catch In Progress status
        while [ "$(check_upload_req | jq -r .status )" == "in progress" ]; do
            echo -e "\n${GREEN}[INFO]${OFF} ${BOLD} Upload of the ${BACKUP_NAME} is still in progress...${OFF}"
            sleep 1m
        done

        # Catch Error status
        if [ "$(check_upload_req | jq -r .status )" == "error" ]; then
            echo -e "${RED}::::[ERROR]${OFF} ${BOLD} Couldn't upload the backup ${BACKUP_NAME}:${OFF}"
            {
            printf '\n%s\n' "UPLOAD ERROR:"
            check_upload_req | jq -r .error
            printf '%s\n' "-------------"
            } | tee -a "$BACKUP_LOG"
            return 1

        # Catch Success status
        elif [ "$(check_upload_req | jq -r .status)" == "success" ]; then
            echo -e "\n${GREEN}[INFO]${OFF} ${BOLD} The ${BACKUP_NAME} is now the last since it's just been uploaded successfully${OFF}"
            touch "${LAST_BACKUP_NAME}"
            echo -n "${BACKUP_NAME}" > "${LAST_BACKUP_NAME}"
        fi
}

Useful commands

While I have tested every command below, I strongly encourage you to learn about clickhouse-backup and its FAQ before proceeding with any restore operations. Your setup may have unique configurations and requirements!

Create a FULL backup manually

/backup/config/clickhouse-backup.sh create_full_backup clickhouse-backup-API-address APIUSER APIPASSWORD

In case you have a cluster, repeat the command with different API address

Create a DIFF backup manually

/backup/config/clickhouse-backup.sh create_diff_backup clickhouse-backup-API-address APIUSER APIPASSWORD

In case you have a cluster, repeat the command with different API address

Delete ALL REMOTE backups one-liner (if you have old ClickHouse version):

for rembackup in $(clickhouse-backup list remote | tail -n +2 | awk '{print $1}'); do 
     clickhouse-backup delete remote "$rembackup"
done

Delete ALL REMOTE backups one-liner:

for rembackup in $(clickhouse-backup list remote| awk '{print $1}'); do
    clickhouse-backup delete remote "$rembackup"
done

Delete ALL LOCAL backups one-liner:

for localbackup in $(clickhouse-backup list local | awk '{print $1}'); do
    clickhouse-backup delete local "$localbackup"
done

add tail -n +2 if your ClickHouse has old version

Restore a single table data:

clickhouse-backup restore <backup name> --data --table db.table_name_shard

If you’re getting “warn <table name> skipped cause system.replicas entry already exists and replication in progress from another replica logger=clickhouse”, try to run the previous command with CLICKHOUSE_CHECK_REPLICAS_BEFORE_ATTACH=0

If you need to restore schema, use –schema parameter and –rm to drop existing schema objects: clickhouse-backup restore –rm<backup name> –schema or delete replica metadata in Zookeeper beforehand.

Result

ClickHouse Backup agent First Run
Noticed the warning? It’s not critical. Just FYI
Clickhouse Backup agent, Errors and warnings
A sample output with error, warning and info messages
ClickHouse Backup Agent Log Files
This is how the log file appears. All you need – errors and requests’ output
ClickHouse Backup agent DIFF backup create action
DIFF backup created successfully, therefore the last backup name had been updated as well
Featured

Git: clone succeeded, but checkout failed

Have you ever faced any issues with git clone? Personally, I can’t recall any significant or memorable problems I’ve encountered while cloning remote repositories. Typically, the issues were related to authentication or network connectivity. Therefore, there was nothing particularly special to write about. However, as you work with different environments, the chances of coming across something interesting enough to share increase, even though it might be obvious to some.

Let’s take a simple example: you’re trying to clone an existing repository, which was created by someone else. The repository had already been filled out with files you need. Assuming you have credentials in place, you run git clone <repo’s url> on your Windows machine and get the following:

I hid the error message. I’ll reveal it later

What could go wrong? The cloning process succeeded, indicating that the issue is not related to Git credentials or network connectivity. However, the checkout process failed. What does this mean? It means that if you navigate to the folder of the cloned repository in the explorer, you won’t find any files written to the disk. Now, let me reveal the full error message, which is straightforward:

error: invalid path 'config/app1/application-staging.yml '
fatal: unable to checkout working tree

Found a “root cause”? There is the whitespace at the end of the filename.

However, you may wonder, since the repository was pre-created and used by other people, how did this happen? You’re correct to question that.

The reason is that Windows doesn’t support trailing space characters and automatically removes them when saving a file (you can read more about it here). On the other hand, Linux does support both leading and trailing whitespaces in filenames.

“file1.txt” and “file1.txt ” are two different files actually

Git knows about these limitations and has a special config setting to control it:

core.protectNTFS

If set to true, do not allow checkout of paths that would cause problems with the NTFS filesystem, e.g. conflict with 8.3 “short” names. Defaults to true on Windows, and false elsewhere.

The reason why other people can clone the repo without issues is that core.protectNTFS is set to false (manually or because of underlying OS)

So, to clone the repo on Windows you can use the following command:

get clone -c core.protectNTFS=false <repo url>
and now you can fix the wrong filename and sync with remote repo

As a summary, I would advise all developers and DevOps engineers to strictly avoid using trailing or leading spaces in filenames altogether. By doing so, we can eliminate the potential conflicts and issues that may arise from incompatible behaviors between different operating systems.

Featured

Python Coding: FizzBuzz challenge

FizzBuzz is a very common task, asked in Dev/DevOps interviews. You are given a range of numbers and need to write algorithm using the following rules: if the number is divisible by 3, print “Fizz”; if the number is divisible by 5, output “Buzz”; if the number is divisible by both 3 and 5, the result should be “FizzBuzz”.

The main goal of the task is to check how you understand loops, conditionals and simple math using one of programming or scripting languages. I solved the task using PowerShell years ago: check this gist.

As I started to learn Python, I decided to share FizzBuzz implementation in this language to show how simple and “elegant” the solution can be.

I used matplotlib and colorama to make a pie chart and add color text output respectively. Defined a function fizz_buzz with 2 arguments, and then used try/catch/finally statements to catch exceptions errors. Inside of the try, the for loop and if conditionals are described to meet all task’s rules. As a result, the function outputs numbers and categories based on rules, and makes a pie chart to show how many fizz, buzz, fizzbuzz found in percentage.

import matplotlib.pyplot as plt
import colorama
from colorama import Fore, Back, Style
colorama.init()

def fizz_buzz(x,y):
    """Python version of popular Fizz Buzz task"""
    fb = 0 ; b = 0; f = 0; rest = 0 # start values
    fb_type = ['fizzbuzz','fizz','buzz','rest'] # plot labels
    fb_colors = ['r','y','c','g'] # plot colors
    fb_explode = [0.2, 0.1, 0.1, 0.1] # plot fraction of the radius
    try:
        for n in range(x,y):
            if ((n % 3 == 0) and (n % 5 == 0)):
                fb += 1
                print(Fore.RED + f"Found FizzBuzz: {n}")
            elif n % 3 ==0:
                f += 1
                print(Fore.WHITE + f"Found Fizz: {n}")
            elif n % 5 ==0:
                b += 1
                print(Fore.GREEN + f"Found Buzz: {n}")
            else: 
                rest += 1
                print(Style.BRIGHT + f"The rest is {n}")
            print(Style.RESET_ALL)
        fb_array = [fb, f, b, rest]
        plt.pie(fb_array, colors = fb_colors, explode = fb_explode, shadow = True, radius = 1.1, autopct = '%1.1f%%') # form a pie
        plt.legend(fb_type,loc='upper right') # show legend
        plt.show() # show a pie
    except:
        print(Style.BRIGHT + Fore.RED + "You provided wrong x and y")
        print(Style.RESET_ALL)
    finally:
        print(Style.BRIGHT + Fore.GREEN + "Author: github.com/rlevchenko")
        print(Style.RESET_ALL)

Result

Available at Gist

Featured

Simple PostgreSQL Backup Agent

Learn how to backup PostgreSQL data

Dockerized cron job to backup PostgreSQL database or multiple databases on different hosts. It’s based on Alpine docker image, so the image size is less than 11 Mb. The script can be also used without docker and docker compose or as a base for your own dockerized cron jobs. My general recommendation is to run docker container on your backup host to provide a kind of isolation from the management partition.

The script or “agent” does the following:

  • Reads content of /config/passfile to get pg_dump connection parameters
  • Verifies if the backup can be done by executing a dry run for each db
  • If the dry run is completed and plain format set, produces plain-text sql script and compresses it with gzip
  • If the dry run succeeds and custom format set, outputs a custom backup archive (more flexible and by default)
  • Cleans up the storage folder. Files older than 30 days are deleted
  • Redirects all cron job statuses to stdout
  • Keeps backup files under ./psql/backups/{hostname}/{dbname}/ on your host
  • Default settings: twice a day at 8:30 and 20:30 UTC; custom format; clean backups older than 30 days

Current limitations:

  • no encryption for specific databases (in to-do list)
  • no handling of wildcars in passfile (in to-do list)

Content

  • Dockerfile – describes docker image
  • docker-compose.yml – docker compose file to build and run agent service
  • /config/cronfile – cron job schedule settings
  • /config/passfile – PostgreSQL .pgpass actually
  • /config/psql_backup.sh – the script itself

Usage guide

  • check out the passfile and provide your own connection parameters
  • verify the cron job settings in the /config/cronfile
  • change make_backup function argument to set format output (plain/custom)
  • update cleaner function argument at the bottom of the script if necessary
  • edit dockerfile/docker-compose.yml or script itself if necessary
  • run docker compose build
  • run docker compose up -d
  • check out the stoud of the container to get the job’s status
  • TO RESTORE: use psql (if plain set) or pg_restore command (if custom format set)

Dockerfile

FROM alpine:3.16.2
LABEL AUTHOR="Roman Levchenko"
LABEL WEBSITE="rlevchenko.com"
RUN mkdir /etc/periodic/custom \
    && mkdir -p /backup/config \ 
    && touch /var/log/cron.log \
    && apk --no-cache add \
    postgresql14-client=14.5-r0 \
    bash=5.1.16-r2
COPY /config/cronfile /etc/crontabs/root
COPY /config/psql_backup.sh /etc/periodic/custom/backup
COPY ["/config/psql_backup.sh","/config/passfile","/backup/config/"]
RUN chmod 755 /etc/periodic/custom/backup \
    && chmod 0600 /backup/config/passfile
CMD ["-f","-l","8", "-L", "/dev/stdout"]
ENTRYPOINT ["crond"]

Script (excerpt)

# Clean old backup files
function cleaner()
{
set -o pipefail -e
	if [[ -n $(find $BACKUP_DIR \( -name "*.sql.gz" -o -name "*.custom" \) -type f -mtime +"$1") ]]; 
	then
		echo -e "\n${GREEN}[INFO]${OFF} ${BOLD}There are backup files older than $1 days. Cleaning up the following files:${OFF}"
		find $BACKUP_DIR \(-name "*.sql.gz" -o -name "*.custom" \) -print -type f -mtime +"$1" -exec rm {} \;
	else 
		echo -e "\n${GREEN}[INFO]${OFF} ${BOLD}There are no backup files older than $1 days. \nHave a nice day!${OFF}"
	fi
set +o pipefail +e
}

Result

Sample Output (w/error and success messages):

Featured

Azure DevOps: Update service connection expired secret

If you’re reading this post, you’re trying to find a way to edit an existing service connection with a new service principal secret/key.

It’s a weird that UI and devops cli don’t allow us to quickly change service connection details if it was created automatically by Azure DevOps (“creationMode”: “Automatic”; will talk about it a bit later).

So, how to change a secret? Answer: Azure DevOps REST API.
Note: if you have correct permissions, try out the steps at the bottom of the post. The steps below are for those who don’t have Owner permissions.

  • Create a new Personal Access Token (full access and all scope, expiration 1 day)
  • Go to Project Settings – Service Connections, choose your connection and click on Manage Service Principal. Add a new secret and note it’s value.
  • Choose a tool to work with REST API. It could be either PowerShell or Postman, for instance. I will show both.
  • [Postman] Install Postman and create a new HTTP Request
Postman – File – New – HTTP Request
  • [Postman] Go to Authorization and paste PAT token to the password field
PAT token should be used as password for any REST API requests
  • [Postman] Using the following GET request, get a service endpoint details in JSON format. Organization Name, Project Name and Endpoint Name are parts of the URI (can be taken from service connections list in the azure devops ui) :

    https://dev.azure.com/<orgName>/<ProjectName>/_apis/serviceendpoint/endpoints?endpointNames=<Endpoint Name> &api-version=6.0-preview.4
  • [Postman] Copy everything from the response under the value as shown below
{
            "data": {
                "subscriptionId": "",
                "subscriptionName": "",
                "environment": "AzureCloud",
                "scopeLevel": "Subscription",
                "creationMode": "Automatic",
                "azureSpnRoleAssignmentId": ""
            },
            ...............
                }
            ]
}
  • [Postman] Using a PUT request update the service connection. Make sure you set Body – Raw to JSON , and then Paste JSON copied in the previous step to the Body
Body – RAW should be set to JSON
  • Here is a tricky part. Prior to sending PUT request, change creationMode from “Automatic” to “Manual”. Also, in my case, I had to delete the following parameters spnObjectId and appObjectId (data section). Plus, I added serviceprincipalkey with a value set to a new secret (authorization section)
    A short excerpt is provided below:
{
    "data": {
        "subscriptionId": "",
        "subscriptionName": "",
        "environment": "AzureCloud",
        "scopeLevel": "Subscription",
        "creationMode": "Manual",  # changed
        "azureSpnRoleAssignmentId": "",
        "azureSpnPermissions": ""
         spnObjectId # deleted
         appObjectId # deleted
    },
    "description": "",
    "authorization": {
        "parameters": {
            "tenantid": "",
            "serviceprincipalid": "",
            "authenticationType": "spnKey",
            "serviceprincipalkey": "secret here" # added
        },
        "scheme": "ServicePrincipal"
}
}
  • [Postman] URI used for a PUT request: https://dev.azure.com/OrganizationName/_apis/serviceendpoint/endpoints/EndpointId?api-version=6.0-preview.4
  • [Postman] Go back to Azure DevOps and make sure that service connections has been updated and ready to use.

  • [PowerShell] Use the following example
$token ="PAT Token"
$orgName = "Organization Name"
$projectName = "Project Name"
$endpointName = "your endpoint"
$endpointId = "your endpoint ID, use GET request or UI"
$header = @{Authorization = 'Basic ' + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$($token)")) }

# Get Endpoint details

Invoke-RestMethod -Method GET -URI "https://dev.azure.com/$($orgName)/$($projectName)/_apis/serviceendpoint/endpoints?endpointNames=$($endpointName)&api-version=6.0-preview.4" -Headers $header -ContentType "application\json"

# Update Endpoint
$json = @{ json here } | ConvertTo-Json -Depth <your depth>
Invoke-RestMethod -Method PUT -URI "https://dev.azure.com/$($OrgName)/_apis/serviceendpoint/endpoints/$($endpointId)?api-version=6.0-preview.4" -Body -Headers $header -ContentType "application\json" -Body $json


That’s it. Now you know how to change a service connection with a new secret without removing a connection and customizing all pipelines in a project.

P.S. If you have Owners permissions on the app registration/service principal used by the connection, try to edit the connection by adding a description, and then click on Save. Azure DevOps should create a new secret and update the connection automatically.

Featured

Deploy Azure Data Services with Terraform

June, 2021 Update: see details below

January, 2024: added examples of for_each/count blocks

Terraform-based deployment of almost all Azure Data Services (default deployment settings are in the parenthesis):

  • Azure Service Bus (Standard, namespace,topic,subscription, auth. rules)
  • Azure Data Lake Storage (ZRS, Hot, Secured, StandardV2)
  • Azure Data Factory (w/Git or without)
  • Azure Data Factory linked with Data Lake Storage
  • Azure Data Factory Pipeline
  • Azure DataBricks WorkSpace (Standard)
  • Azure EventHub (Standard, namespace)
  • Azure Functions (Dynamic, LRS storage, Python, w/App.Insights or without)
  • Azure Data Explorer (Kusto, Standard_D11_v2, 2 nodes)
  • Azure Analysis Server (backup-enabled, S0, LRS, Standard)
  • Azure Event Grid (domain, EventGridSchema)
  • Azure SQL Server (version 12.0)
  • Azure SQL Database (ElasticPool SKU name, 5 GB max data size)
  • Azure SQL Elastic Pool (StandardPool, LicenseIncluded, 50 eDTU/50GB)

Properties and content

  • Over 1k strings and 26 terraform resources in total
  • Almost every string is commented out, multiple conditions in each resource, variable conditions to check it’s value before the deployment and etc. So, it’s flexible, not hardcoded and allows you to create infrastructure with your own set of resources.
  • Written a few years ago, updated once since then to fix deprecated features
  • June, 2021 Update: SQL Server, Database and Elastic Pool , added variable conditions (for example, sql password must be longer than 8 symbols and have upper-case, digits and special characters), added a sensitive variable (just for sample), new Terraform 0.15.5 syntax/features were added, multiple minor changes
  • January, 2024 Update: examples of count and for_each blocks are shown here
  • Tested with the latest Terraform 0.15.5 and Azure provider 2.62.0 (the first version of the script worked fine with >=0.12 and AzureRM >=1.35, just check the syntax and try out)
  • auth.tf – provider authentication and version settings
  • main.tf – a desired Azure infrastructure
  • terraform.tfvars – controls deployment settings
  • variables.tf – variables list
  • outputs.tf – outputs useful information

Deployment settings (excerpt)

#--------------------------------------------------------------
# What should be deployed?
#--------------------------------------------------------------
servicebus       = true  # Azure Service Bus
datafactory      = true  # Azure Data Factory
datafactory_git  = false # Enable GIT for Data Factory? (don't forget to set Git settings in the Data Factory section)
databricks       = true  # Azure DataBricks
eventhub         = true  # Azure EventHub
functions        = true  # Azure Functions 
functions_appins = true  # Integrate App.Insights with Azure Functions?
eventgrid        = true  # Azure EventGrid
kusto            = true  # Azure Data Explorer (kusto)
analysis         = true  # Azure Analysis Server
sqlserver        = true  # Azure SQL Server 
sqlep            = true  # Azure SQL Elastic Pool
sqldb            = true  # Azure SQL Database

Resource block (excerpt)

#Azure SQL Database
resource "azurerm_mssql_database" "rlmvp-svc-sql-db" {
  count           = var.sqlserver == "true" && var.sqldb == "true" ? 1 : 0
  name            = "${var.prefix}sqldb${random_string.rndstr.result}"
  elastic_pool_id = var.sqlep == "true" ? azurerm_mssql_elasticpool.rlmvp-svc-sql-elastic-pool[count.index].id : null
  server_id       = azurerm_sql_server.rlmvp-svc-sql-server[count.index].id
  max_size_gb     = var.az_sql_db_maxsize
  sku_name        = var.az_sql_db_sku_name
  tags            = var.az_tags
}

Variable Conditions

variable "az_sqlserver_password" {
  type        = string
  description = "Azure SQL Server Admin's Password"
  validation {
    condition     = length(var.az_sqlserver_password) > 8 && can(regex("(^.*[A-Z0-9].*[[:punct:]].*$)", var.az_sqlserver_password)) # meets Azure SQL password's policy
    error_message = "SQL Server Admin's password must contain more than 6 symbols (lowercase + upper-case and special/punctuation characters!)."
  }
}

Usage guide

  • Open the terraform.tfvars file
  • Indicate the “What Should Be Deployed?” section
  • Use true/false to set your desired configuration
  • Check or change Azure services settings in the appropriate sections (naming convention (prefix/suffix), location, SKUs and etc.)
  • Run terraform init to get required Terraform providers
  • Run terraform plan to initiate pre-deployment check
  • Run terraform apply to start a deployment
  • (optional) terraform destroy to delete Azure resources

Requirements

  • The script uses Service Principal authentication, so define the subscription ID, client ID, tenand ID and principal secret in the auth.tf (or use another authentication type – Managed Identity, if your CI is running on Azure VMs, for instance)
  • If you are going to deploy Analysis Server (enabled, by default), provide valid Azure AD user(s) UPN(s) to set them as administrators of Analysis Server (az_ansrv_users variable, file – terraform.tfvars)

Result

P.S. feel free to share/commit/fork/slam/sell/copy and do anything that your conscience allows you 🙂

Featured

Playing with Kubernetes running on Azure (AKS)

Heptio (it’s founders co-created Kubernetes) polled about 400 IT decision makers from different sectors and company sizes to find out whether they use Kubernetes or not, understand obstacles and overall experience. About 60% of respondents are using Kubernetes today, and 65% expect to be using the technology in the next year. More surprisingly, about 77% of companies with more than 1000 developers that run Kubernetes are using it in production. 

Furthermore, VMware has recently announced the Project Pacific  that completely rebuilds vSphere architecture. Kubernetes is going to be a control plane in future vSphere versions. Sounds amazing, doesn’t it? 

The supervisor cluster is a Kubernetes cluster of ESXi instead of Linux

I hope you have warmed up and taken an interest in something that may not have been familiar with – Containers and Kubernetes. I believe so and recommend to read about Kubernetes and Docker concepts before we get started.

We’re gonna do some tasks in Azure Kubernetes Service (managed Kubernetes service in Azure) to help you to dive into Kubernetes and also get hands on experience with related services such as Container Registry, AKS, Terraform and Docker. 

Tasks: 
Notes:
  • This GitHub Repo includes everything covered in this blog post 
  • This lab uses a custom and simple ASP.NET Core web application that we will deploy to, and then publish by using Kubernetes (K8S)
  • Docker multistage image build packs up the application
  • Azure Container Registry stores the Docker image
  • Terraform automates deployment of Azure Kubernetes Service and Azure Container Registry. Scripts are stored at the different repo
  • Azure Kubernetes Services provides a managed Kubernetes master node in the cloud with ability to scale up worker nodes. AKS will host our PODs (roughly speaking, PODs represent process/containers running on  k8s cluster)
  • Azure CLI, PowerShell, docker, terraform, kubectl  (command line interface for running commands against Kubernetes clusters) are the main tools for completing tasks. Make sure you have them installed on your machine or use Azure Cloud Shell instead.
  • Create a Service Principal beforehand (Conributor role)
  • Azure DevOps is used for CI/CD (optional)

Deploy Kubernetes cluster in Azure

Although, we can use Azure Portal, CLI or PowerShell to deploy new Azure services, using Infrastructure as Code (IaC) approach is more visionary. We will use Terraform, so check out my repo and comments inside . Terraform creates Azure Container Registry:

#Get RG (create a new one if necessary by using "resource azurerm..")
data "azurerm_resource_group" "Rg" {
  name = "kubRg"
}

............

#Create a container registry
resource "azurerm_container_registry" "cr" {
  name                = "cr${random_string.randomName.result}"
  resource_group_name = "${data.azurerm_resource_group.Rg.name}"
  location            = "${data.azurerm_resource_group.Rg.location}"
  admin_enabled       = true
  sku                 = "Basic"
  # Only for classic SKU (deprecated)
  # storage_account_id  = "${azurerm_storage_account.storacc.id}" (Classic)
}
..............

Configures Kubernetes cluster with Azure Container Network Interface (CNI) to allow you to access PODs directly as every POD gets an IP from Azure subnet rather than use kubenet . At the end of configuration file, terraform enables K8S RBAC (it’s disabled by default in Azure) that we’ll use later during the service accounts creation.

resource "azurerm_kubernetes_cluster" "k8sClu" {
  name                = "rlk8sclu-${random_string.randomName.result}"
  location            = "${data.azurerm_resource_group.Rg.location}"
  resource_group_name = "${data.azurerm_resource_group.Rg.name}"
  dns_prefix          = "${var.dnsPrefix}"

  .......

  network_profile {
    network_plugin = "azure"
  }

  role_based_access_control {
    enabled = true
  }
  .........

Apply the configuration, and then check the output (in my case, resources have been already deployed, so nothing to add). Note ACR and AKS resource names (/managedClusters/…; registries/…)

For an additional deployment check , open up the Azure Cloud Shell or Azure CLI and type the following to open Kubernetes portal:

#Get Azure AKS Credentials
az login
az aks get-credentials --resource-group kubRg --name rlk8sclu-l3y5

#Open K8S dashboard
kubectl create clusterrolebinding kubernetes-dashboard --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
az aks browse --resource-group kubRg --name rlk8sclu-l3y5

#View kubectl config (optional, %HOMEPATH%/.kube/config)
kubectl config view

If your browser shows up a new page, likely there aren’t any issues with the deployment. Let’s jump into the second task.

TIP: get addresses of the master and services by running kubectl cluster-info

Make Docker image with the application

Let’s create a docker image with the application, and then push the image to the Azure Container Registry. The Dockerfile is located at the root of the aspnetapp folder (check out my repo) and describes multi-stage image build process. There is also the Dockerignore file to define folders that must be excluded from the image.

Run the Docker CLI and build the image (docker build <dir>):

Push the image to Azure Container Registry:

az acr login --name acrName
docker login acrFQDN
docker tag aspnetapp acrFQDN/aspnetapp
docker push acrFQND/aspnetapp

TIP: get the attributes of the image by running az acr repository show -n crl3y5 –image aspnetapp:latest

Make two fully isolated namespaces within the Kubernetes cluster

Once the docker image is uploaded to ACR, we are ready to proceed with the Kubernetes tasks. When you need something to change in K8S you may use kubectl to define operations sequentially or manifests files (yaml) that can describe multiple requests to K8S API Server in the declarative form.

If you look at my repo, you can see two folders ns1 and ns2 that store yaml/manifest files for respective namespaces. We’ll use that files in conjunction with kubectl to make some changes on the AKS cluster. Because manifests files are almost the same, only manifests for NS1 will be shown.

#Apply the manifest (example)
kubectl apply -f pathToTheManifestFile

Create a new namespace:

#Create a namespace 1
apiVersion: v1
kind: Namespace
metadata:
  name: ns1
  labels:
    author: rlevchenko

To deny ingress traffic from PODs running in the other namespaces:

#NS1
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-from-other-namespaces
  namespace: ns1
spec:
  podSelector:
    matchLabels: {}
  ingress:
    - from:
        - podSelector: {}

TIP: use kubectl get namespaces to list namespaces and kubectl get networkpolicy -n ns1 to get the policy

Configure anti-affinity for PODs

To make sure that group of PODs (labelSelector section) in the cluster running on particular nodes , we need to configure affinity/anti-affinity rules. This anti-affinity “rule” ensures that each POD with app=aspcore label does not co-locate on a single node.

.....
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - aspcore
              topologyKey: "kubernetes.io/hostname"
......

TIP: use kubectl get pods -o wide -n ns1 to get info about assigned nodes, and then kubectl get nodes –show-labels to check node labels

Configure network policy to deny egress traffic from PODs (except DNS requests)

This task shows how you can filter network traffic from PODs in the namespace. All PODs with the app=aspcore label in the first namespace can make only DNS requests (out), other ones will be denied.

#Deny all traffic (except DNS) from PODs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-external-egress
  namespace: ns1
spec:
  podSelector:
    matchLabels:
      app: aspcore
  policyTypes:
    - Egress
  egress:
    # allow DNS TCP/UDP 53 ports
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
    - to:
        - namespaceSelector: {}

TIP: get list of the network policies kubectl get networkpolicy -n ns1

Create a service account with read permission on PODs in the first namespace

A service account provides an identity for processes that run in a Pod. This except of manifest file describes a service account read-sa-ns that has read only permissions on PODs in the NS1 namespace (the rules section/verbs). Also, note that rbac role is used which we have enabled during applying the terraform configuration.

#New SA - ns level
apiVersion: v1
kind: ServiceAccount
metadata:
  name: read-sa-ns
---
#New Role - ns level
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: read-only-ns
  namespace: ns1
rules:
  - apiGroups: ["", "extensions", "apps"]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
#Binding the role to the sa -NS1
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-ns1-pods
  namespace: ns1
subjects:
  - kind: ServiceAccount
    name: read-sa-ns
    apiGroup: ""
    namespace: default
roleRef:
  kind: Role
  name: read-only-ns
  apiGroup: rbac.authorization.k8s.io

TIP: get roles in the NS1 namespace kubectl get role -n ns1 , and then check service accounts in K8S cluster kubectl get serviceaccounts –all-namespaces

Set CPU and RAM limits for each pod

If a container is created in the ns1 namespace, and the container does not specify its own values for memory request and memory limit, the container is given a default memory request of 128 MiB and a default memory limit of 400 MiB. In addition, you can define limits on the PODs level.

#Define mem-cpu limits
apiVersion: v1
kind: LimitRange
metadata:
  name: mem-cpu-limit-range
  namespace: ns1
spec:
  limits:
  - default:
      memory: 256Mi
      cpu: "400M"
    defaultRequest:
      memory: 128Mi
      cpu: "200M"
    type: Container

TIP: check the limits by running kubectl describe pod podname -n ns1

Configure PODs scalability based on CPU metric

Kubernetes allows you to automatically scale PODs based on the CPU/RAM metrics (horizontal pod autoscaler). If CPU average utilization is equal or greater than 70%, K8S deploys additional replicas (spec stenza, maxReplicas).

#Scale pods automatically (cpu metric)
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: aspcore-load-cpu
  namespace: ns1
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: asp-deployment1
  minReplicas: 1
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 70

TIP: to list the NS limits kubectl describe namespace ns1

Publish the application

Now it’s time to publish the application running on a container. First, create a POD that will use our docker image that we’ve already pushed to the Azure Container Registry. A one Pod with the latest image will be created under the ns1 namespace. Check labels (one of the most important things, actually in K8S 🙂 ) , pod name and number (replicas)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: asp-deployment1
  namespace: ns1
  labels:
    app: web
    release: stable
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aspcore
  template:
    metadata:
      labels:
        app: aspcore
    spec:
      containers:
        - name: web-aspcore
          image: crl3y5.azurecr.io/aspnetapp:latest

TIP: use kubectl get pods -n ns1 -o wide to check the pod state in the ns1

If the Pod’s status is running, you can publish it via the LoadBalancer service:

#Publish the deployment through the Service

apiVersion: v1
kind: Service
metadata:
  name: demo-service1
  namespace: ns1
spec:
  selector:
    app: aspcore
  type: LoadBalancer
  ports:
    - name: name-of-the-port
      port: 80
      targetPort: 80

Then check the deployment status, and get public ip of the service:

#Get deployments in the NS1
kubectl get deployments -n ns1

#Get Service's Public IP
kubectl get service -n ns1 -o jsonpath='{.items[].status.loadBalancer.ingress[0].ip}'

Open up the browser and navigate to http://publicip/api/values to verify that application is published and works:

What’s next?

Complete the following homework tasks to boost your skills:

  • Make a test POD with a static volume (use Azure storage)
  • Make sure that PODs are running under non-root account
  • Create a service account with read permission on all PODs in the cluster
  • Add context details about service accounts to your configuration file (kubeconfig), and then verify service accounts permissions
  • Configure PODs scalability based on RAM or network metrics
  • Check yourself – the answers are at my repo
  • Create CI/CD Azure DevOps pipelines to automate docker image build and environment deployment (use terraform/kubectl/docker)

Thanks for reading, stars and commits!

Technical Review: Learn PowerShell in a Month of Lunches, Fourth Edition

Let me continue with a series of blog posts discussing technical reviews conducted between 2022 and the present year. In a previous post, I highlighted “Learn PowerShell in a Month of Lunches” as the ideal starting point for anyone delving into PowerShell.

If you’ve never read any book from the “Learn <Something> in a Month of Lunches” series, these books are unique because each chapter contains clear explanations, examples, and practical lab tasks at the end to reinforce the chapter’s topic. Such books are designed to streamline your learning process. As evident from the table of contents, there are approximately 30 chapters or fewer. By dedicating just one day to each chapter, you can master the technology in just one month.

Having reviewed this book myself, I found it immensely valuable. The authors, PowerShell team members Travis Plunk and Tyler Leonhardt, alongside Microsoft MVP James Petty, have updated the third edition to encompass the latest version of PowerShell, which includes its expansion into multiple platforms such as Linux and macOS. Consequently, the book is no longer confined to Windows.

Let’s examine the table of contents to see what you’ll be able to accomplish with PowerShell after reading the book.

:: Table of Contents ::

1 Before you begin
2 Meet PowerShell
3 Using the help system
4 Running commands
5 Working with providers
6 The pipeline: Connecting commands
7 Adding commands
8 Objects: Data by another name
9 A practical interlude
10 The pipeline, deeper
11 Formatting: And why it’s done on the right
12 Filtering and comparisons
13 Remote control: One-to-one and one-to-many
14 Multitasking with background jobs
15 Working with many objects, one at a time
16 Variables: A place to store your stuff
17 Input and output
18 Sessions: Remote control with less work
19 You call this scripting?
20 Improving your parameterized script
21 Using regular expressions to parse text files
22 Using someone else’s script
23 Adding logic and loops
24 Handling errors
25 Debugging techniques
26 Tips, tricks, and techniques
27 Never the end
App. PowerShell cheat sheet

Is this book for you? If you’re an Administrator, DevOps Engineer, or even a developer, you’ll undoubtedly find the book extremely useful and easy to follow. It covers topics such as handling errors, loops, filtering and comparison, input and output, and much more. After reading at least two editions myself, I highly recommend this book as one of my favorites, given its well-structured content (chapter-labs-summary). Rating 10/10.

Technical Review: Amazon Web Services in Action, Third Edition

Amazon Web Services (AWS) is a comprehensive and widely adopted cloud computing platform provided by Amazon.com. Launched in 2006, AWS offers a vast array of cloud services to help individuals, businesses, and organizations build and deploy applications, store data, and manage their IT infrastructure more efficiently. Moreover, AWS is the longest-running Magic Quadrant Leader, with Gartner naming AWS a Leader for the thirteenth consecutive year.

The full Gartner Report

It’s rare to encounter a DevOps or Cloud engineer who hasn’t engaged with AWS. Even if one specializes in Azure or GCP, familiarity with AWS is almost essential, at least in theory. Personally, having spent over two years working with AWS, I’ve found the platform to be both seamless and captivating.

It’s no wonder that AWS remains a frontrunner in the market. Therefore, it’s crucial to have access to quality learning resources for efficiently mastering AWS. For me, AWS in Action, 3rd edition stands out as the ideal choice.

I have read AWS in Action, 2nd edition and had a privilege to review the 3rd edition in September, 2022. The authors, brothers Andreas Wittig and Michael Wittig are cloud consultants focusing on Amazon Web Services with years of experience, covered all the core services in clear, plain language, including services such as AWS Lambda and CloudFormation.

The book is full of real-world examples and written for mid-level developers and SysOps/DevOps Engineers. The short list of services covered in the book: EC2, CloudFormation, CloudWatch, integration with Jenkins, S3, Glacier, EBS, EFS, RDS, DynamoDB, ECS, Fargate, ElastiCache, Elastic Load Balancing and Simple Queue Service

Click to see the table of contents
  • Part 1: Getting started
  • Chapter 1 What is Amazon Web Services?
  • Chapter 2 A Simple Example: WordPress in Fifteen Minutes

Part 2: Building virtual infrastructure of computers and networking

  • Chapter 3 Using Virtual machines: EC2
  • Chapter 4 Programming Your Infrastructure: The command line, SDKs and CloudFormation
  • Chapter 5 Securing Your System: IAM, security groups and VPC
  • Chapter 6 Automating Operational Tasks with Lambda
  • Part 3: storing data in the cloud
  • Chapter 7 Storing Your Objects: S3 and Glacier
  • Chapter 8 Storing on Hard Drives: EBS and instance store
  • Chapter 9 Sharing Data Volumes Between Machines: EFS
  • Chapter 10 Using a Relational Database Service: RDS
  • Chapter 11 Caching Data in Memory: ElastiCache
  • Chapter 12 Programming for the NoSQL Database Service: DynamoDB
  • Part 4: Architecting on AWS
  • Chapter 13 Achieving High Availability: Availability zones, auto-scaling and CloudWatch
  • Chapter 14 Decoupling Your Infrastructure: Elastic Load Balancing and Simple Queue Service
  • Chapter 15 Automating Deployment
  • Chapter 16 Designing for Fault Tolerance
  • Chapter 17 Scaling up and down: Auto-scaling and CloudWatch
  • Chapter 18 Building Modern Architectures for the Cloud: ECS and Fargate

Pay attention to Part 4: Architecting on AWS. This part will help you prevent outage of your applications running in AWS by implementing right architecture and set of tools. As a result, you will be able to design greenfield environments or plan migrations to AWS.

For DevOps and software engineers, services such as AWS Lambda, CodeDeploy, CloudFormation, HashiCorp’s Packer and integration of some services with Jenkins are also covered. So, you will be ready to automate your deployment after reading the book.

Overall, it’s highly recommended book to purchase. My favorite on AWS and, based on sales, it’s a bestseller in AWS category. Thanks Andreas Wittig and Michael Wittig for fundamental work and keeping the book up-to-date.