feat: add certificate management module and schedule auto-renewal cron

This commit is contained in:
Fredrick Amnehagen 2026-02-05 20:36:15 +01:00
parent 42767fd8bc
commit f793ddd02f
6 changed files with 214 additions and 198 deletions

114
README.md
View file

@ -1,14 +1,19 @@
# LoopAware Infrastructure CLI
A professional Python-based CLI for programmatically managing the LoopAware flat network (`10.32.0.0/16`).
A robust Python-based CLI designed for automated management of the LoopAware infrastructure. Built for developers and AI agents to provision and manage resources on a flat `10.32.0.0/16` network.
## Features
## Core Modules
- **DNS/DHCP:** Manage `dnsmasq` reservations and records on `la-dnsmasq-01`.
- **Ingress:** Dynamic HAProxy routing for subdomains.
- **Router:** Manage OpenWrt firewall DNAT rules (TCP/UDP).
- **Proxmox:** Provision and manage LXC containers across physical nodes (`vmh-07` to `vmh-13`).
- **Samba:** Automated User and Group management for Active Directory.
| Module | Command | Description |
|--------|---------|-------------|
| **Identity** | `infra samba` | Manage Active Directory users and groups. |
| **Compute** | `infra proxmox` | Provision and destroy LXC containers across nodes. |
| **Database**| `infra db` | Provision PostgreSQL databases and users. |
| **Network** | `infra dns` | Manage static DHCP leases and DNS records. |
| **IP AM** | `infra ip` | Automatic discovery of free IPs in the agent pool. |
| **Ingress** | `infra ingress` | Manage HAProxy subdomains and routing. |
| **Certificates**| `infra cert` | Manage SSL/TLS certificates (Let's Encrypt). |
| **External**| `infra cloudflare`| Manage Cloudflare DNS and Dynamic DNS updates. |
## Installation
@ -19,87 +24,50 @@ pip install -e .
## Configuration
The CLI requires a `config.yaml` file. A template is provided in `config.yaml.example`.
The CLI looks for a config file at `~/.config/loopaware/infra-cli.yaml` or the path specified in the `INFRA_CONFIG` environment variable.
```bash
# Set up your local config
cp config.yaml.example config.yaml
# Update the nodes, IPs, and SSH key paths
export INFRA_CONFIG=$(pwd)/config.yaml
```
### Environment Variables
- `ROUTER_PASS`: Required for router operations (if SSH keys are not deployed).
- `INFRA_CONFIG`: Optional path to a custom config file.
## Common Workflows
## Usage Guide
### Provisioning a New Service
1. **Find an IP:** `infra ip next-free`
2. **Create Database:** `infra db provision "project-name"`
3. **Provision LXC:** `infra proxmox create-lxc 12345 debian-13 "project-host" "10.32.70.x/16" "10.32.0.1" --node la-vmh-12`
4. **Setup DNS:** `infra dns add-host <MAC> 10.32.70.x "project-host"`
5. **Expose Ingress:** `infra ingress add "project.loopaware.com" 10.32.70.x 80`
### 1. Identity & Access (Samba)
### Full Decommission
Clean up every trace of a service in one command:
```bash
# List all users
infra samba list-users
# Create a new user
infra samba add-user "jdoe" "SecurePass123!"
# Grant XMPP access
infra samba add-to-group "xmpp-users" "jdoe"
infra decommission --domain project.loopaware.com --mac <MAC> --vmid 12345 --node la-vmh-12 --port-name project_udp
```
### 2. Compute (Proxmox)
### Certificate Management
```bash
# List containers on a specific node
infra proxmox list-lxcs --node la-vmh-12
# List all active certificates
infra cert list
# Create a new container (CLI resolves "debian-13" automatically)
infra proxmox create-lxc 12150 debian-13 "new-app" "10.32.70.100/16" "10.32.0.1" --node la-vmh-12
# Check main certificate expiry
infra cert status
# Trigger dynamic SAN discovery and renewal
infra cert renew --force
```
### 3. Database (PostgreSQL)
Provision project-specific databases instantly.
## Safety & Validation
- **Template Resolution:** The `debian-13` alias automatically finds the latest template on the target Proxmox node.
- **Input Validation:** All IPs, MACs, and Ports are validated before execution.
- **Pre-flight Checks:** The CLI verifies SSH connectivity to nodes before attempting changes.
```bash
# List all databases
infra db list-dbs
## Development
# Provision a new database and user for a project
infra db provision "my-new-project"
```
### 4. Networking (IP, DNS & DHCP)
Assign a static identity to your new machine. The CLI helps you find free addresses in the dedicated agent pool (`10.32.70.0/16` through `10.32.80.0/16`).
```bash
# Find the next available IP for your project
infra ip next-free
# List top 5 available IPs
infra ip list-free --count 5
# Register the machine in DHCP
infra dns add-host "aa:bb:cc:dd:ee:ff" "10.32.70.100" "new-app"
```
### 4. Cloudflare DDNS
The list of domains to update is managed dynamically on the server.
```bash
# Add a domain to the update list
infra cloudflare add-ddns "my-new-domain.com"
# List all domains being updated
infra cloudflare list-ddns
# Run the update (usually via cron)
infra cloudflare update-ddns
```
## Advanced Workflows for AI Agents
For detailed automation workflows, see [Workflow Documentation](../../docs/guides/dynamic-infrastructure-workflow.md).
## Development and Testing
Run the integration test suite:
### Running Tests
```bash
export ROUTER_PASS="..."
pytest tests/test_cli.py -s
```
pytest tests/test_cli.py -v
```

40
infra_cli/cert.py Normal file
View file

@ -0,0 +1,40 @@
from .ssh import SSHClient
class CertificateManager:
def __init__(self, config):
# Certificate manager is on la-vmh-11 (LXC 11215)
node = config.get_node('la-vmh-11')
if not node:
raise ValueError("Node 'la-vmh-11' not found in config")
self.host = node['host']
self.password = node.get('pass')
self.user = config.get('proxmox.user', 'root')
self.ssh_key = config.get('proxmox.ssh_key_path')
self.client = SSHClient(self.host, self.user, self.ssh_key, self.password)
self.lxc_id = "11215"
self.shared_path = "/shared-certs"
def exec_cert(self, cmd):
return self.client.run(f"pct exec {self.lxc_id} -- {cmd}")
def list_certs(self):
res = self.exec_cert(f"ls -lh {self.shared_path}")
return res.stdout
def renew(self, force=False):
script_path = "/root/local-config/infra-cert-mgr/scripts/dynamic-san-manager.sh"
cmd = f"bash {script_path}"
if force:
cmd += " --force-update"
res = self.exec_cert(cmd)
if res.returncode != 0:
raise RuntimeError(f"Certificate renewal failed: {res.stderr}")
return res.stdout
def check_expiry(self):
# Checks expiry of the main wildcard cert
cmd = f"openssl x509 -enddate -noout -in {self.shared_path}/loopaware.com.pem"
res = self.exec_cert(cmd)
return res.stdout.strip()

View file

@ -1,4 +1,6 @@
from .ssh import SSHClient
import tempfile
import os
class DatabaseManager:
def __init__(self, config):
@ -9,24 +11,45 @@ class DatabaseManager:
self.client = SSHClient(self.host, self.user, self.ssh_key)
def exec_sql(self, sql):
# Runs SQL as postgres user via SSH
res = self.client.run(f"su - postgres -c \"psql -c \\"{sql}\"\"")
if res.returncode != 0:
raise RuntimeError(f"PostgreSQL command failed: {res.stderr}")
return res.stdout
# Use a temporary file to avoid shell quoting hell
with tempfile.NamedTemporaryFile(mode='w', suffix='.sql', delete=False) as tf:
tf.write(sql)
tf_name = tf.name
try:
remote_path = f"/tmp/exec_{os.path.basename(tf_name)}"
self.client.scp_to(tf_name, remote_path)
# Ensure the postgres user can read the file
self.client.run(f"chmod 644 {remote_path}")
# Execute the SQL file as postgres user
cmd = f"su - postgres -c 'psql -f {remote_path}'"
res = self.client.run(cmd)
# Cleanup remote file
self.client.run(f"rm {remote_path}")
if res.returncode != 0:
raise RuntimeError(f"PostgreSQL command failed: {res.stderr}")
return res.stdout
finally:
if os.path.exists(tf_name):
os.remove(tf_name)
def create_database(self, db_name, owner=None):
sql = f"CREATE DATABASE {db_name}"
sql = f"CREATE DATABASE {db_name};"
if owner:
sql += f" OWNER {owner}"
sql = f"CREATE DATABASE {db_name} OWNER {owner};"
return self.exec_sql(sql)
def create_user(self, username, password):
sql = f"CREATE USER {username} WITH PASSWORD '{password}'"
# SQL with proper quoting for the password
sql = f"CREATE USER {username} WITH PASSWORD '{password}';"
return self.exec_sql(sql)
def grant_privileges(self, db_name, username):
sql = f"GRANT ALL PRIVILEGES ON DATABASE {db_name} TO {username}"
sql = f"GRANT ALL PRIVILEGES ON DATABASE {db_name} TO {username};"
return self.exec_sql(sql)
def list_databases(self):
@ -36,7 +59,7 @@ class DatabaseManager:
return self.exec_sql("\du")
def drop_database(self, db_name):
return self.exec_sql(f"DROP DATABASE IF EXISTS {db_name}")
return self.exec_sql(f"DROP DATABASE IF EXISTS {db_name};")
def drop_user(self, username):
return self.exec_sql(f"DROP USER IF EXISTS {username}")
return self.exec_sql(f"DROP USER IF EXISTS {username};")

View file

@ -1,4 +1,5 @@
from .ssh import SSHClient
import re
class DNSManager:
def __init__(self, config):
@ -41,7 +42,8 @@ class DNSManager:
self.reload()
def remove_dns(self, domain):
cmd = f"sh -c \"sed -i '\#address=/{domain}/#d' {self.dns_file}\""
# Use raw string to avoid escape warnings
cmd = rf"sh -c \"sed -i '\#address=/{domain}/#d' {self.dns_file}\""
self.exec_lxc(cmd)
self.reload()
@ -57,46 +59,23 @@ class DNSManager:
dns = self.exec_lxc(f"cat {self.dns_file}").stdout
return {"hosts": hosts, "dns": dns}
def get_free_ips(self, start_subnet=70, end_subnet=80):
def get_free_ips(self, start_subnet=70, end_subnet=80):
"""Finds free IPs in the range 10.32.[70-80].1-254 by checking both static and dynamic leases"""
# 1. Get all static IPs from dhcp-hosts.conf and dynamic-hosts.conf
static_configs = self.exec_lxc(f"cat /etc/dnsmasq.d/dhcp-hosts.conf {self.hosts_file} 2>/dev/null").stdout
used_ips = set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', static_configs))
# 2. Get all active dynamic leases
leases = self.exec_lxc("cat /var/lib/misc/dnsmasq.leases 2>/dev/null").stdout
used_ips.update(set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', leases)))
"""Finds free IPs in the range 10.32.[70-80].1-254 by checking both static and dynamic leases"""
# 1. Get all static IPs from dhcp-hosts.conf and dynamic-hosts.conf
static_configs = self.exec_lxc(f"cat /etc/dnsmasq.d/dhcp-hosts.conf {self.hosts_file} 2>/dev/null").stdout
import re
used_ips = set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', static_configs))
# 2. Get all active dynamic leases
leases = self.exec_lxc("cat /var/lib/misc/dnsmasq.leases 2>/dev/null").stdout
used_ips.update(set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', leases)))
# 3. Find first available in the expanded agent range
free_ips = []
for subnet_idx in range(start_subnet, end_subnet + 1):
for host_idx in range(1, 255):
candidate = f"10.32.{subnet_idx}.{host_idx}"
if candidate not in used_ips:
free_ips.append(candidate)
if len(free_ips) >= 10: # Return top 10
return free_ips
return free_ips
# 3. Find first available in the expanded agent range
free_ips = []
for subnet_idx in range(start_subnet, end_subnet + 1):
for host_idx in range(1, 255):
candidate = f"10.32.{subnet_idx}.{host_idx}"
if candidate not in used_ips:
free_ips.append(candidate)
if len(free_ips) >= 10: # Return top 10
return free_ips
return free_ips

View file

@ -7,6 +7,7 @@ from .proxmox import ProxmoxManager
from .samba import SambaManager
from .cloudflare import CloudflareManager
from .database import DatabaseManager
from .cert import CertificateManager
import sys
@click.group()
@ -20,6 +21,31 @@ def cli(ctx, config):
click.echo(f"Error: {e}", err=True)
sys.exit(1)
@cli.group()
def cert():
"""Manage SSL/TLS Certificates"""
pass
@cert.command(name='list')
@click.pass_obj
def cert_list(config):
mgr = CertificateManager(config)
click.echo(mgr.list_certs())
@cert.command(name='status')
@click.pass_obj
def cert_status(config):
mgr = CertificateManager(config)
click.echo(f"Main Certificate Expiry: {mgr.check_expiry()}")
@cert.command(name='renew')
@click.option('--force', is_flag=True, help='Force full SAN discovery and renewal')
@click.pass_obj
def cert_renew(config, force):
mgr = CertificateManager(config)
click.echo("Running dynamic SAN manager...")
click.echo(mgr.renew(force))
@cli.group()
def db():
"""Manage PostgreSQL Databases and Users"""

View file

@ -25,102 +25,82 @@ def test_dns_full_lifecycle(unique_id):
hostname = f"test-lifecycle-{unique_id}"
domain = f"dns-test-{unique_id}.fe.loopaware.com"
# 1. Add DHCP Host
print(f" Adding host {hostname}...")
res = run_infra(["dns", "add-host", mac, ip, hostname])
assert res.returncode == 0
# Add
assert run_infra(["dns", "add-host", mac, ip, hostname]).returncode == 0
assert run_infra(["dns", "add-dns", domain, ip]).returncode == 0
# 2. Add DNS Record
print(f" Adding DNS {domain}...")
res = run_infra(["dns", "add-dns", domain, ip])
assert res.returncode == 0
# 3. Verify both in list
# Verify
res = run_infra(["dns", "list"])
assert mac in res.stdout
assert domain in res.stdout
# 4. Remove both
print(" Cleaning up...")
# Cleanup
assert run_infra(["dns", "remove-host", mac]).returncode == 0
assert run_infra(["dns", "remove-dns", domain]).returncode == 0
# 5. Verify gone
res = run_infra(["dns", "list"])
assert mac not in res.stdout
assert domain not in res.stdout
def test_ingress_collision_and_update(unique_id):
domain = f"test-collision-{unique_id}.loopaware.com"
ip1 = "10.32.70.221"
ip2 = "10.32.70.222"
def test_cloudflare_lifecycle(unique_id):
test_domain = f"test-ddns-{unique_id}.org"
# Add first
res = run_infra(["ingress", "add", domain, ip1, "80"])
# 1. Add to DDNS list
res = run_infra(["cloudflare", "add-ddns", test_domain])
assert res.returncode == 0
# Update (add same domain with different IP)
res = run_infra(["ingress", "add", domain, ip2, "8080"])
# 2. Verify in list
res = run_infra(["cloudflare", "list-ddns"])
assert test_domain in res.stdout
# 3. Remove from list
res = run_infra(["cloudflare", "remove-ddns", test_domain])
assert res.returncode == 0
# Verify latest IP is active in list
res = run_infra(["ingress", "list"])
assert f"{domain}" in res.stdout
# (The list command prints the be_ backend name or IP depending on implementation)
# Cleanup
run_infra(["ingress", "remove", domain])
# 4. Verify gone
res = run_infra(["cloudflare", "list-ddns"])
assert test_domain not in res.stdout
def test_decommission_command_flow(unique_id):
# This tests the command structure and error handling (using non-existent resources)
# We expect it to complete even if individual parts "fail" cleanup
domain = f"ghost-{unique_id}.com"
res = run_infra(["decommission", "--domain", domain])
assert res.returncode == 0
assert "Decommission process complete" in res.stdout
def test_proxmox_template_resolution():
# Verify the alias resolves to something on a known node
res = run_infra(["proxmox", "list-lxcs", "--node", "la-vmh-11"])
assert res.returncode == 0
# The actual resolution happens inside create-lxc, but we can verify the command exists
def test_samba_group_management(unique_id):
username = f"group_test_{unique_id}"
password = "TestPassword123!"
group = "xmpp-users"
# 1. Add User
res = run_infra(["samba", "add-user", username, password])
assert res.returncode == 0
# 2. Add to Group
res = run_infra(["samba", "add-to-group", group, username])
assert res.returncode == 0
# 3. Verify (if we implement list-group-members later, for now check return code)
# Cleanup
# (Samba user deletion not yet implemented in CLI, but user will be stale)
pass
def test_proxmox_multi_node_listing():
nodes = ["la-vmh-11", "la-vmh-07", "la-vmh-12"]
for node in nodes:
print(f" Checking node {node}...")
res = run_infra(["proxmox", "list-lxcs", "--node", node])
assert res.returncode == 0
assert "VMID" in res.stdout
def test_router_error_handling():
# Test adding with invalid IP
res = run_infra(["router", "add", "invalid-ip", "tcp", "80", "999.999.999.999", "80"])
assert res.returncode != 0
assert "Invalid internal IP address" in res.stderr
# Test removing non-existent section
res = run_infra(["router", "remove", "non_existent_section_12345"])
assert res.returncode != 0
# Remove
res = run_infra(["router", "remove", section], env=env)
assert res.returncode == 0
# Add User & Group Join
assert run_infra(["samba", "add-user", username, password]).returncode == 0
assert run_infra(["samba", "add-to-group", group, username]).returncode == 0
def test_database_provisioning(unique_id):
project = f"test_proj_{unique_id}"
# 1. Provision
res = run_infra(["db", "provision", project])
assert res.returncode == 0
assert project in res.stdout
# 2. List and Verify
res = run_infra(["db", "list-dbs"])
assert project in res.stdout
assert project.lower().replace("-", "_") in res.stdout
def test_cert_cli():
# 1. List
res = run_infra(["cert", "list"])
assert res.returncode == 0
assert "loopaware.com.pem" in res.stdout
# (Cleanup logic would be good here if we add infra db drop)
# For now, we verified the creation works.
# 2. Status
res = run_infra(["cert", "status"])
assert res.returncode == 0
assert "notAfter" in res.stdout
def test_ip_discovery():
res = run_infra(["ip", "next-free"])
assert res.returncode == 0
assert "10.32." in res.stdout