clickhouse – UseIT | Roman Levchenko

In the previous post, we explored my custom ClickHouse backup agent, built upon the clickhouse-backup tool, logrotate, Cron and Bash scripts. I have also shared all the necessary resources for testing the agent on your local machine using Docker as well as Docker Compose or deploying it in a production environment. Let’s update the agent’s repo with some Python code.

You may be familiar with a main GitOps principle: use Git as the single source of truth; store your applications and infrastructure configurations in a Git repository along with application code. Kubernetes (yaml), Terraform (tf), Docker, Compose files, Jenkinsfile and even diagrams can be good examples of files kept in such repositories. But how to represent diagrams? As png, vsd or jpeg? Let’s pretend we’re developers and can draw diagrams using code.

The diagrams project brings this approach to life. I opted for Diagrams (mingrammer) because it’s free and built on Python and Graphviz, widely used language and tool that enable you to create various diagrams, whether it’s a flowchart or a cloud architecture. Another advantage is that the project is actively maintained and continuously developed. You can also check out other tools such as pyflowchart, mermaid, plantuml or terrastruct.

Let’s get started and draw a flowchart for the clickhouse backup agent using Diagrams (mingrammer). First, install Python (>3.7; mine is 3.11) and Graphviz (9.0.0, Windows in my env), then install diagrams module (0.23.4).

Diagrams include the following objects: node (=shapes; programming, azure, custom and others), edge (=connection lines; linkage between nodes), cluster (=group of isolated nodes) and diagram (represents your entire chart). Each object has it’s own attributes. Description of all attributes can be found at Graphviz docs. Also, check out basic examples to understand what we gonna “build”. I won’t describe every attribute. DYOR.

The first line of your code might look like this:

# import required modules
from diagrams import Diagram, Edge, Cluster, Node

Then we define attributes for each object (excerpt):

# define attributes for graphviz components
graph_attributes = {
    "fontsize": "9",
    "orientation": "portrait",
    "splines":"spline"
}

Next, we need to describe diagram object and it’s attributes (excerpt):

with Diagram(show=False, outformat="png", graph_attr=graph_attributes, direction="TB"):
    # nodes and icons
    start_end_icon = "diagram/custom-images/start-end.png"
    start = Node (label="Start", image=start_end_icon, labelloc="c", height="0.4", weight="0.45", **node_attributes)

I use general Node class with custom images which were taken from programming nodes and then optimized to my flowchart (I’ve deleted canvas and resized images). You could safely use diagrams.programming.flowchart node class instead, but be ready to play with height/width node’s attributes. Another way to add your own images as nodes is Custom node class.

We have described icons and shared nodes. Now we need to add the first group of nodes to represent the main process of the agent and flowchart (creating and uploading FULL backups):

# cluster/full backup
    with Cluster("main", graph_attr=graph_attributes):
       diff_or_full = Node (label="TYPE?", image=decision_icon, height="0.7", weight="", labelloc="c", **node_attributes )

Subroutine processes (diff backups and etc.) are clusters (excerpt):

# cluster/diff backup
    with Cluster("diff", graph_attr=graph_attributes):
      create_diff_backup = Node (label="Create DIFF", labelloc="c", height="0.5", weight="4", image=action_icon, **node_attributes)

Edges or connections between nodes are defined at the bottom (excerpt):

# Log connections
    diff_or_full - Edge(label="\n\n wrong type", tailport="e", headport="n", **edge_attributes ) - write_error

As a result, I’ve updated the repo with diagram as code; slightly modified GitHub actions by adding a new step to “draw” diagram and check python code. When I push new commits to the repo, the diagram is created and published as an artifact with nodes (start, end, condition, action, catch, input/output), four clusters (main, diff, log, upload log) and edges between nodes.

Looks pretty good, doesn’t it?

# Create partition lsblk # get dev name fdisk /dev/sdb # use 8e type, other settings are default lsblk # check pvcreate /dev/sdb1 # create a volume pvdisplay # check volumes vgcreate clickhouse /dev/sdb1 # create a volume group lvcreate --name data -l 100%FREE clickhouse # create a logical volume mkfs.ext4 /dev/clickhouse/data # make ext4 fs

# edit fstab, following best practices - use noatime option # /etc/fstab, use UUID or /var/lib/clickhouse defaults,noatime # if UUID is used, run blkid /dev/mapper/clickhouse-data # Example /dev/mapper/clickhouse-data /var/lib/clickhouse ext4 defaults,noatime 0 0

# Stop ch server sudo systemctl stop clickhouse-server # prepare dirs mv /var/lib/clickhouse /var/lib/clickhouse-tmp mkdir /var/lib/clickhouse chown clickhouse:clickhouse /var/lib/clickhouse # activate the mount defined in the fstab mount /var/lib/clickhouse # copy data cp -R /var/lib/clickhouse-tmp/* /var/lib/clickhouse/ chown -R clickhouse:clickhouse /var/lib/clickhouse # get ch server back sudo systemctl start clickhouse-server