feat: add certificate management module and schedule auto-renewal cron

This commit is contained in:
Fredrick Amnehagen 2026-02-05 20:36:15 +01:00
parent 42767fd8bc
commit f793ddd02f
6 changed files with 214 additions and 198 deletions

112
README.md
View file

@ -1,14 +1,19 @@
# LoopAware Infrastructure CLI # LoopAware Infrastructure CLI
A professional Python-based CLI for programmatically managing the LoopAware flat network (`10.32.0.0/16`). A robust Python-based CLI designed for automated management of the LoopAware infrastructure. Built for developers and AI agents to provision and manage resources on a flat `10.32.0.0/16` network.
## Features ## Core Modules
- **DNS/DHCP:** Manage `dnsmasq` reservations and records on `la-dnsmasq-01`. | Module | Command | Description |
- **Ingress:** Dynamic HAProxy routing for subdomains. |--------|---------|-------------|
- **Router:** Manage OpenWrt firewall DNAT rules (TCP/UDP). | **Identity** | `infra samba` | Manage Active Directory users and groups. |
- **Proxmox:** Provision and manage LXC containers across physical nodes (`vmh-07` to `vmh-13`). | **Compute** | `infra proxmox` | Provision and destroy LXC containers across nodes. |
- **Samba:** Automated User and Group management for Active Directory. | **Database**| `infra db` | Provision PostgreSQL databases and users. |
| **Network** | `infra dns` | Manage static DHCP leases and DNS records. |
| **IP AM** | `infra ip` | Automatic discovery of free IPs in the agent pool. |
| **Ingress** | `infra ingress` | Manage HAProxy subdomains and routing. |
| **Certificates**| `infra cert` | Manage SSL/TLS certificates (Let's Encrypt). |
| **External**| `infra cloudflare`| Manage Cloudflare DNS and Dynamic DNS updates. |
## Installation ## Installation
@ -19,87 +24,50 @@ pip install -e .
## Configuration ## Configuration
The CLI requires a `config.yaml` file. A template is provided in `config.yaml.example`. The CLI looks for a config file at `~/.config/loopaware/infra-cli.yaml` or the path specified in the `INFRA_CONFIG` environment variable.
```bash ```bash
# Set up your local config
cp config.yaml.example config.yaml cp config.yaml.example config.yaml
# Update the nodes, IPs, and SSH key paths export INFRA_CONFIG=$(pwd)/config.yaml
``` ```
### Environment Variables ## Common Workflows
- `ROUTER_PASS`: Required for router operations (if SSH keys are not deployed).
- `INFRA_CONFIG`: Optional path to a custom config file.
## Usage Guide ### Provisioning a New Service
1. **Find an IP:** `infra ip next-free`
2. **Create Database:** `infra db provision "project-name"`
3. **Provision LXC:** `infra proxmox create-lxc 12345 debian-13 "project-host" "10.32.70.x/16" "10.32.0.1" --node la-vmh-12`
4. **Setup DNS:** `infra dns add-host <MAC> 10.32.70.x "project-host"`
5. **Expose Ingress:** `infra ingress add "project.loopaware.com" 10.32.70.x 80`
### 1. Identity & Access (Samba) ### Full Decommission
Clean up every trace of a service in one command:
```bash ```bash
# List all users infra decommission --domain project.loopaware.com --mac <MAC> --vmid 12345 --node la-vmh-12 --port-name project_udp
infra samba list-users
# Create a new user
infra samba add-user "jdoe" "SecurePass123!"
# Grant XMPP access
infra samba add-to-group "xmpp-users" "jdoe"
``` ```
### 2. Compute (Proxmox) ### Certificate Management
```bash ```bash
# List containers on a specific node # List all active certificates
infra proxmox list-lxcs --node la-vmh-12 infra cert list
# Create a new container (CLI resolves "debian-13" automatically) # Check main certificate expiry
infra proxmox create-lxc 12150 debian-13 "new-app" "10.32.70.100/16" "10.32.0.1" --node la-vmh-12 infra cert status
# Trigger dynamic SAN discovery and renewal
infra cert renew --force
``` ```
### 3. Database (PostgreSQL) ## Safety & Validation
Provision project-specific databases instantly. - **Template Resolution:** The `debian-13` alias automatically finds the latest template on the target Proxmox node.
- **Input Validation:** All IPs, MACs, and Ports are validated before execution.
- **Pre-flight Checks:** The CLI verifies SSH connectivity to nodes before attempting changes.
```bash ## Development
# List all databases
infra db list-dbs
# Provision a new database and user for a project ### Running Tests
infra db provision "my-new-project"
```
### 4. Networking (IP, DNS & DHCP)
Assign a static identity to your new machine. The CLI helps you find free addresses in the dedicated agent pool (`10.32.70.0/16` through `10.32.80.0/16`).
```bash
# Find the next available IP for your project
infra ip next-free
# List top 5 available IPs
infra ip list-free --count 5
# Register the machine in DHCP
infra dns add-host "aa:bb:cc:dd:ee:ff" "10.32.70.100" "new-app"
```
### 4. Cloudflare DDNS
The list of domains to update is managed dynamically on the server.
```bash
# Add a domain to the update list
infra cloudflare add-ddns "my-new-domain.com"
# List all domains being updated
infra cloudflare list-ddns
# Run the update (usually via cron)
infra cloudflare update-ddns
```
## Advanced Workflows for AI Agents
For detailed automation workflows, see [Workflow Documentation](../../docs/guides/dynamic-infrastructure-workflow.md).
## Development and Testing
Run the integration test suite:
```bash ```bash
export ROUTER_PASS="..." export ROUTER_PASS="..."
pytest tests/test_cli.py -s pytest tests/test_cli.py -v
``` ```

40
infra_cli/cert.py Normal file
View file

@ -0,0 +1,40 @@
from .ssh import SSHClient
class CertificateManager:
def __init__(self, config):
# Certificate manager is on la-vmh-11 (LXC 11215)
node = config.get_node('la-vmh-11')
if not node:
raise ValueError("Node 'la-vmh-11' not found in config")
self.host = node['host']
self.password = node.get('pass')
self.user = config.get('proxmox.user', 'root')
self.ssh_key = config.get('proxmox.ssh_key_path')
self.client = SSHClient(self.host, self.user, self.ssh_key, self.password)
self.lxc_id = "11215"
self.shared_path = "/shared-certs"
def exec_cert(self, cmd):
return self.client.run(f"pct exec {self.lxc_id} -- {cmd}")
def list_certs(self):
res = self.exec_cert(f"ls -lh {self.shared_path}")
return res.stdout
def renew(self, force=False):
script_path = "/root/local-config/infra-cert-mgr/scripts/dynamic-san-manager.sh"
cmd = f"bash {script_path}"
if force:
cmd += " --force-update"
res = self.exec_cert(cmd)
if res.returncode != 0:
raise RuntimeError(f"Certificate renewal failed: {res.stderr}")
return res.stdout
def check_expiry(self):
# Checks expiry of the main wildcard cert
cmd = f"openssl x509 -enddate -noout -in {self.shared_path}/loopaware.com.pem"
res = self.exec_cert(cmd)
return res.stdout.strip()

View file

@ -1,4 +1,6 @@
from .ssh import SSHClient from .ssh import SSHClient
import tempfile
import os
class DatabaseManager: class DatabaseManager:
def __init__(self, config): def __init__(self, config):
@ -9,24 +11,45 @@ class DatabaseManager:
self.client = SSHClient(self.host, self.user, self.ssh_key) self.client = SSHClient(self.host, self.user, self.ssh_key)
def exec_sql(self, sql): def exec_sql(self, sql):
# Runs SQL as postgres user via SSH # Use a temporary file to avoid shell quoting hell
res = self.client.run(f"su - postgres -c \"psql -c \\"{sql}\"\"") with tempfile.NamedTemporaryFile(mode='w', suffix='.sql', delete=False) as tf:
if res.returncode != 0: tf.write(sql)
raise RuntimeError(f"PostgreSQL command failed: {res.stderr}") tf_name = tf.name
return res.stdout
try:
remote_path = f"/tmp/exec_{os.path.basename(tf_name)}"
self.client.scp_to(tf_name, remote_path)
# Ensure the postgres user can read the file
self.client.run(f"chmod 644 {remote_path}")
# Execute the SQL file as postgres user
cmd = f"su - postgres -c 'psql -f {remote_path}'"
res = self.client.run(cmd)
# Cleanup remote file
self.client.run(f"rm {remote_path}")
if res.returncode != 0:
raise RuntimeError(f"PostgreSQL command failed: {res.stderr}")
return res.stdout
finally:
if os.path.exists(tf_name):
os.remove(tf_name)
def create_database(self, db_name, owner=None): def create_database(self, db_name, owner=None):
sql = f"CREATE DATABASE {db_name}" sql = f"CREATE DATABASE {db_name};"
if owner: if owner:
sql += f" OWNER {owner}" sql = f"CREATE DATABASE {db_name} OWNER {owner};"
return self.exec_sql(sql) return self.exec_sql(sql)
def create_user(self, username, password): def create_user(self, username, password):
sql = f"CREATE USER {username} WITH PASSWORD '{password}'" # SQL with proper quoting for the password
sql = f"CREATE USER {username} WITH PASSWORD '{password}';"
return self.exec_sql(sql) return self.exec_sql(sql)
def grant_privileges(self, db_name, username): def grant_privileges(self, db_name, username):
sql = f"GRANT ALL PRIVILEGES ON DATABASE {db_name} TO {username}" sql = f"GRANT ALL PRIVILEGES ON DATABASE {db_name} TO {username};"
return self.exec_sql(sql) return self.exec_sql(sql)
def list_databases(self): def list_databases(self):
@ -36,7 +59,7 @@ class DatabaseManager:
return self.exec_sql("\du") return self.exec_sql("\du")
def drop_database(self, db_name): def drop_database(self, db_name):
return self.exec_sql(f"DROP DATABASE IF EXISTS {db_name}") return self.exec_sql(f"DROP DATABASE IF EXISTS {db_name};")
def drop_user(self, username): def drop_user(self, username):
return self.exec_sql(f"DROP USER IF EXISTS {username}") return self.exec_sql(f"DROP USER IF EXISTS {username};")

View file

@ -1,4 +1,5 @@
from .ssh import SSHClient from .ssh import SSHClient
import re
class DNSManager: class DNSManager:
def __init__(self, config): def __init__(self, config):
@ -41,7 +42,8 @@ class DNSManager:
self.reload() self.reload()
def remove_dns(self, domain): def remove_dns(self, domain):
cmd = f"sh -c \"sed -i '\#address=/{domain}/#d' {self.dns_file}\"" # Use raw string to avoid escape warnings
cmd = rf"sh -c \"sed -i '\#address=/{domain}/#d' {self.dns_file}\""
self.exec_lxc(cmd) self.exec_lxc(cmd)
self.reload() self.reload()
@ -57,46 +59,23 @@ class DNSManager:
dns = self.exec_lxc(f"cat {self.dns_file}").stdout dns = self.exec_lxc(f"cat {self.dns_file}").stdout
return {"hosts": hosts, "dns": dns} return {"hosts": hosts, "dns": dns}
def get_free_ips(self, start_subnet=70, end_subnet=80): def get_free_ips(self, start_subnet=70, end_subnet=80):
"""Finds free IPs in the range 10.32.[70-80].1-254 by checking both static and dynamic leases"""
"""Finds free IPs in the range 10.32.[70-80].1-254 by checking both static and dynamic leases""" # 1. Get all static IPs from dhcp-hosts.conf and dynamic-hosts.conf
static_configs = self.exec_lxc(f"cat /etc/dnsmasq.d/dhcp-hosts.conf {self.hosts_file} 2>/dev/null").stdout
# 1. Get all static IPs from dhcp-hosts.conf and dynamic-hosts.conf used_ips = set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', static_configs))
static_configs = self.exec_lxc(f"cat /etc/dnsmasq.d/dhcp-hosts.conf {self.hosts_file} 2>/dev/null").stdout
import re
used_ips = set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', static_configs))
# 2. Get all active dynamic leases
leases = self.exec_lxc("cat /var/lib/misc/dnsmasq.leases 2>/dev/null").stdout
used_ips.update(set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', leases)))
# 3. Find first available in the expanded agent range
free_ips = []
for subnet_idx in range(start_subnet, end_subnet + 1):
for host_idx in range(1, 255):
candidate = f"10.32.{subnet_idx}.{host_idx}"
if candidate not in used_ips:
free_ips.append(candidate)
if len(free_ips) >= 10: # Return top 10
return free_ips
return free_ips
# 2. Get all active dynamic leases
leases = self.exec_lxc("cat /var/lib/misc/dnsmasq.leases 2>/dev/null").stdout
used_ips.update(set(re.findall(r'10\.32\.[0-9]{1,3}\.[0-9]{1,3}', leases)))
# 3. Find first available in the expanded agent range
free_ips = []
for subnet_idx in range(start_subnet, end_subnet + 1):
for host_idx in range(1, 255):
candidate = f"10.32.{subnet_idx}.{host_idx}"
if candidate not in used_ips:
free_ips.append(candidate)
if len(free_ips) >= 10: # Return top 10
return free_ips
return free_ips

View file

@ -7,6 +7,7 @@ from .proxmox import ProxmoxManager
from .samba import SambaManager from .samba import SambaManager
from .cloudflare import CloudflareManager from .cloudflare import CloudflareManager
from .database import DatabaseManager from .database import DatabaseManager
from .cert import CertificateManager
import sys import sys
@click.group() @click.group()
@ -20,6 +21,31 @@ def cli(ctx, config):
click.echo(f"Error: {e}", err=True) click.echo(f"Error: {e}", err=True)
sys.exit(1) sys.exit(1)
@cli.group()
def cert():
"""Manage SSL/TLS Certificates"""
pass
@cert.command(name='list')
@click.pass_obj
def cert_list(config):
mgr = CertificateManager(config)
click.echo(mgr.list_certs())
@cert.command(name='status')
@click.pass_obj
def cert_status(config):
mgr = CertificateManager(config)
click.echo(f"Main Certificate Expiry: {mgr.check_expiry()}")
@cert.command(name='renew')
@click.option('--force', is_flag=True, help='Force full SAN discovery and renewal')
@click.pass_obj
def cert_renew(config, force):
mgr = CertificateManager(config)
click.echo("Running dynamic SAN manager...")
click.echo(mgr.renew(force))
@cli.group() @cli.group()
def db(): def db():
"""Manage PostgreSQL Databases and Users""" """Manage PostgreSQL Databases and Users"""

View file

@ -25,102 +25,82 @@ def test_dns_full_lifecycle(unique_id):
hostname = f"test-lifecycle-{unique_id}" hostname = f"test-lifecycle-{unique_id}"
domain = f"dns-test-{unique_id}.fe.loopaware.com" domain = f"dns-test-{unique_id}.fe.loopaware.com"
# 1. Add DHCP Host # Add
print(f" Adding host {hostname}...") assert run_infra(["dns", "add-host", mac, ip, hostname]).returncode == 0
res = run_infra(["dns", "add-host", mac, ip, hostname]) assert run_infra(["dns", "add-dns", domain, ip]).returncode == 0
assert res.returncode == 0
# 2. Add DNS Record # Verify
print(f" Adding DNS {domain}...")
res = run_infra(["dns", "add-dns", domain, ip])
assert res.returncode == 0
# 3. Verify both in list
res = run_infra(["dns", "list"]) res = run_infra(["dns", "list"])
assert mac in res.stdout assert mac in res.stdout
assert domain in res.stdout assert domain in res.stdout
# 4. Remove both # Cleanup
print(" Cleaning up...")
assert run_infra(["dns", "remove-host", mac]).returncode == 0 assert run_infra(["dns", "remove-host", mac]).returncode == 0
assert run_infra(["dns", "remove-dns", domain]).returncode == 0 assert run_infra(["dns", "remove-dns", domain]).returncode == 0
# 5. Verify gone def test_cloudflare_lifecycle(unique_id):
res = run_infra(["dns", "list"]) test_domain = f"test-ddns-{unique_id}.org"
assert mac not in res.stdout
assert domain not in res.stdout
def test_ingress_collision_and_update(unique_id): # 1. Add to DDNS list
domain = f"test-collision-{unique_id}.loopaware.com" res = run_infra(["cloudflare", "add-ddns", test_domain])
ip1 = "10.32.70.221"
ip2 = "10.32.70.222"
# Add first
res = run_infra(["ingress", "add", domain, ip1, "80"])
assert res.returncode == 0 assert res.returncode == 0
# Update (add same domain with different IP) # 2. Verify in list
res = run_infra(["ingress", "add", domain, ip2, "8080"]) res = run_infra(["cloudflare", "list-ddns"])
assert test_domain in res.stdout
# 3. Remove from list
res = run_infra(["cloudflare", "remove-ddns", test_domain])
assert res.returncode == 0 assert res.returncode == 0
# Verify latest IP is active in list # 4. Verify gone
res = run_infra(["ingress", "list"]) res = run_infra(["cloudflare", "list-ddns"])
assert f"{domain}" in res.stdout assert test_domain not in res.stdout
# (The list command prints the be_ backend name or IP depending on implementation)
# Cleanup def test_decommission_command_flow(unique_id):
run_infra(["ingress", "remove", domain]) # This tests the command structure and error handling (using non-existent resources)
# We expect it to complete even if individual parts "fail" cleanup
domain = f"ghost-{unique_id}.com"
res = run_infra(["decommission", "--domain", domain])
assert res.returncode == 0
assert "Decommission process complete" in res.stdout
def test_proxmox_template_resolution():
# Verify the alias resolves to something on a known node
res = run_infra(["proxmox", "list-lxcs", "--node", "la-vmh-11"])
assert res.returncode == 0
# The actual resolution happens inside create-lxc, but we can verify the command exists
def test_samba_group_management(unique_id): def test_samba_group_management(unique_id):
username = f"group_test_{unique_id}" username = f"group_test_{unique_id}"
password = "TestPassword123!" password = "TestPassword123!"
group = "xmpp-users" group = "xmpp-users"
# 1. Add User # Add User & Group Join
res = run_infra(["samba", "add-user", username, password]) assert run_infra(["samba", "add-user", username, password]).returncode == 0
assert res.returncode == 0 assert run_infra(["samba", "add-to-group", group, username]).returncode == 0
# 2. Add to Group
res = run_infra(["samba", "add-to-group", group, username])
assert res.returncode == 0
# 3. Verify (if we implement list-group-members later, for now check return code)
# Cleanup
# (Samba user deletion not yet implemented in CLI, but user will be stale)
pass
def test_proxmox_multi_node_listing():
nodes = ["la-vmh-11", "la-vmh-07", "la-vmh-12"]
for node in nodes:
print(f" Checking node {node}...")
res = run_infra(["proxmox", "list-lxcs", "--node", node])
assert res.returncode == 0
assert "VMID" in res.stdout
def test_router_error_handling():
# Test adding with invalid IP
res = run_infra(["router", "add", "invalid-ip", "tcp", "80", "999.999.999.999", "80"])
assert res.returncode != 0
assert "Invalid internal IP address" in res.stderr
# Test removing non-existent section
res = run_infra(["router", "remove", "non_existent_section_12345"])
assert res.returncode != 0
# Remove
res = run_infra(["router", "remove", section], env=env)
assert res.returncode == 0
def test_database_provisioning(unique_id): def test_database_provisioning(unique_id):
project = f"test_proj_{unique_id}" project = f"test_proj_{unique_id}"
# 1. Provision
res = run_infra(["db", "provision", project]) res = run_infra(["db", "provision", project])
assert res.returncode == 0 assert res.returncode == 0
assert project in res.stdout assert project in res.stdout
# 2. List and Verify
res = run_infra(["db", "list-dbs"]) res = run_infra(["db", "list-dbs"])
assert project in res.stdout assert project.lower().replace("-", "_") in res.stdout
# (Cleanup logic would be good here if we add infra db drop) def test_cert_cli():
# For now, we verified the creation works. # 1. List
res = run_infra(["cert", "list"])
assert res.returncode == 0
assert "loopaware.com.pem" in res.stdout
# 2. Status
res = run_infra(["cert", "status"])
assert res.returncode == 0
assert "notAfter" in res.stdout
def test_ip_discovery():
res = run_infra(["ip", "next-free"])
assert res.returncode == 0
assert "10.32." in res.stdout