Waleed Nasir

Full Stack Developer

Project

AI Captioning Tool

Introduction

This AI-powered SaaS makes adding animated captions, overlays, and transitions to videos fast and easy. As someone with a background in video editing, I understand the struggles creators face. This tool is built to save time and remove the hassle of manual editing while making videos look more professional and engaging.

Stack

Next.js

Supabase

Remotion

Tailwind CSS

AWS Lambda

Cloudflare R2

Stripe

React

1

2

3

4

The Spark

Editor's Need, Developer's Vision

As a video editor, I found that existing captioning tools were not good enough. I wanted a tool that could be more creative and automate more tasks. I was curious about how these tools worked and wanted to build something better. My research led me to Remotion, and I was excited to see that I could use React to create videos programmatically. This gave me the power and control I needed to build a better solution.
I started by creating a web-based tool to make a simple way to add cool animated captions to videos. To do this, I needed a good base. For signing in and saving data, I chose Supabase because it offers a quick setup with PostgreSQL and user login features. For storing video files, I looked at Amazon S3, Cloudinary, and Cloudflare R2. Cloudflare R2 was the best choice because it doesn't charge for data leaving their system and has easy pricing, which is cheaper for lots of video files. For turning audio into text, I planned to use FFmpeg.wasm to get audio in the browser, upload it to R2, and then use AssemblyAI to get accurate text from the audio. With the text ready, I would use React to make dynamic caption styles and a user-friendly editor, adding smart design features using the Gemini API.

The Pivot

Redefining the Path for Power and Privacy

As I worked on the web app, I started to wonder if I was just making something similar to what already exists, but with higher costs. I wanted to do something different and learn new things, so I kept asking myself questions. But what really made me think differently was realizing that a web-based approach wouldn't work well for the advanced features I wanted to add. For example, users would have to upload really big video files, which would take a long time and might not be private enough. That felt like a big problem.
This realization made me change my strategy moving to a desktop application. This approach solved the two main problems. Large video files could be processed on the user's machine, eliminating upload frustrations and enhancing user privacy. Furthermore, a desktop app could use the user's machine power for rendering. For users with less powerful systems, I envisioned a hybrid model using cloud functions like AWS Lambda for optional, intensive rendering tasks, offering the best of both worlds. This change wasn't just technical it was a re-commitment to building a truly powerful and user-centric tool.

The Desktop Dive

Navigating New Frameworks & Building the Core

Switching to a desktop application raised new technical issues, mainly the choice of framework. As a web developer, Electron.js, which uses web technologies like React, was a clear choice. I briefly looked at Tauri, which promised smaller builds with Rust, but returned to Electron.js due to its bigger community and more resources. Electron.js has successfully been used to build complex applications like VS Code and Slack, showing it can deliver a good user experience despite being "bloated".
Using Electron.js as the base, I used my existing knowledge in React, Tailwind CSS for styling, and Framer Motion for smooth animations to create a familiar yet powerful development environment. The desktop design made features like efficient video clip extraction from long videos much easier by not needing to upload them. Core features like AI-powered captions, dynamic caption styles, and an intuitive text-based video editor were rebuilt and improved for this new approach.

The Outcome & Future Vision

A Powerful Tool and Continuous Learning

The outcome of this journey is ClipFast ⚡, a desktop application that provides creators with AI-driven animated captions, seamless video editing, and efficient clip extraction. Features like emoji support and sound effect integration are crafted to enhance creativity and streamline workflows.
This project has been an invaluable learning experience, teaching me the significance of adaptability and the balance between technology and practicality. It has demonstrated how web development skills can be applied to diverse platforms. Looking ahead, I aim to explore optional cloud-rendering capabilities and enhance AI features to further empower video creators. However, progress is currently slow due to some issues. This journey is fueled by a passion for addressing real-world challenges faced by editors and a dedication to continuous learning in the dynamic tech landscape.

Let's bring your ideas to life

Reach out if you'd like to collaborate or discuss design systems. I'm all ears.

Email
Copied