AWS DevOps Support, AWS Security & Networking, AWS Well-Architected Reviews,

One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running

The Client

Katalist is a generative AI-powered video and storyboarding platform that enables creators to transform scripts into cinematic, production-ready visuals. With consistent AI characters, dynamic scenes, and automated video generation, Katalist streamlines the journey from concept to final cut.

  • Industry:Technology, Information and Internet
  • Company Size:1-10
  • Country:United States
“Getting the infrastructure right, especially ML infrastructure is a challenging and time consuming task even for a large business with readily available resources. For startups like Katalist it can be life and death. With Cloudvisor's help we were able to get it right and make the most of the precious AWS credits we had at our disposal with minimum headache.”
Blaz Blokar
Blaz Blokar

Founder & CTO of Katalist AI

Challenges

  • Multi-cloud complexity – AI and supporting workloads were running across AWS, GCP, Azure, and Vercel, increasing operational overhead and reducing visibility.
  • Inefficient autoscaling – Due to scaling issues in Kubernetes, infrastructure had to be provisioned for peak load, leading to higher-than-necessary costs.
  • Manual deployments – Helm-based deployments were largely manual and could take up to a full day to complete.
  • Security gaps – Improvements were needed in IAM configuration, CloudTrail auditing, GuardDuty monitoring, and overall account governance.
  • Network architecture limitations – VPC configuration required adjustments, including NAT Gateways, VPC endpoints, and secure access mechanisms.
  • Limited cost optimisation visibility – The infrastructure lacked a structured review to identify savings opportunities and architectural improvements.

Before Cloudvisor’s involvement, the client’s Kubernetes-based AI platform was operational, but there were opportunities to improve autoscaling efficiency, strengthen security controls, and streamline deployment processes to enhance agility and cost management.

Solutions

Cloudvisor started with a comprehensive infrastructure and security review, followed by targeted architectural and automation improvements focused on autoscaling, governance, security, and deployment efficiency.

  1. Well-Architected Framework Review

    An AWS Well-Architected Framework Review was conducted to identify architectural gaps in areas of security, reliability, automation, cost optimization, define remediation priorities, and look for potential funding opportunities for improvements.

  2. EKS Autoscaling Optimisation with Karpenter

    The node autoscaling issue in Amazon EKS was addressed by planning and configuring Karpenter to dynamically provision nodes based on workload demand, reducing the need to size for peak load.

  3. IAM and Account Governance Improvements

    Best-practice guidance was provided for IAM configuration, helping improve role management, access control, and overall AWS account governance.

  4. Enhanced Security Monitoring

    CloudTrail and GuardDuty configurations were reviewed and improved to ensure stronger auditing capabilities and proactive threat detection.

  5. VPC and Network Architecture Enhancements

    The VPC was adjusted by introducing NAT Gateways and S3 VPC endpoints, and the EKS control plane was moved to a private subnet. A Client VPN solution was designed to enable secure access to private resources within AWS network such as EKS control plane, databases and EC2 nodes.

  6. Customised EKS Node Configuration

    EKS node AMIs were customised to pre-load Docker images and ML models, simplifying automation and reducing initialization time for AI workloads.

  7. Automated CI/CD with ArgoCD

    An automated CI/CD pipeline using ArgoCD was implemented significantly reducing manual deployment effort and improving release reliability. Additional changes were made to enable future implementation of canary deployment.

Ongoing Managed Service

Following the initial engagement, Katalist moved onto the Cloudvisor Managed Service subscription — so their team can stay focused on building the product, not running the infrastructure.

  1. Bi-weekly environment checks

    Katalist's AWS environment reviewed against the post-engagement baseline, catching drift before it becomes a problem.

  2. Monthly cost & architecture reviews

    Compute efficiency, autoscaling performance, and unused resources — keeping the EKS-based AI platform lean.

  3. 24/7 monitoring & alerting

    Tailored monitoring stack with alerts firing directly to the team's Slack channel.

  4. Direct engineering access

    Direct access to Cloudvisor engineers for architecture questions and incident support — no new SoW needed.

  1. What's next

    Canary deployment rollout, autoscaling refinement, and ongoing security hardening as the platform scales.

Learn more

AWS Services Used

  • Amazon EKS (Elastic Kubernetes Service)
  • Karpenter
  • AWS IAM (Identity and Access Management)
  • AWS CloudTrail

Results

Katalist came to Cloudvisor with a fragmented multi-cloud setup, manual deployments, and an AI platform that needed to scale without breaking the bank. What followed was a focused infrastructure overhaul — and the start of a continuous partnership that keeps the platform optimised as Katalist grows.

  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 2
    Optimised AutoscalingKarpenter eliminated peak-load provisioning, reducing unnecessary compute costs.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 4
    Clear Improvement RoadmapThe Well-Architected Review delivered actionable next steps and identified funding opportunities.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 6
    Faster DeploymentsArgoCD automation reduced deployment time from up to a day to a streamlined release process.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 8
    Stronger Network SetupPrivate subnets, VPN access, and VPC endpoints increased security and reliability.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 10
    Improved Security VisibilityEnhanced CloudTrail and GuardDuty strengthened monitoring and auditing.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 12
    Production ReplicationAfter successful validation, the same autoscaling configuration was rolled out to the production environment.
  • One Platform, One Partner: How Cloudvisor Keeps Katalist's AI Infrastructure Running 14
    Infrastructure that scales with the AICloudvisor keeps the AWS estate in step with the platform, so the team stays focused on the product.
Ready to see how Cloudvisor can do the same for your business?
Get in touch with us and let’s take your growth to the next level!
Get in touch