Ship AI Apps in Minutes: A Guide to Replicate, Gradio, Streamlit & More

Based on a Lightning Talk, Sundai Club Anniversary, Cambridge, MA

At Sundai Club, we're passionate about building products in just one day. I recently gave a lightning talk at the first anniversary festival on what I'd do if I only had 15 minutes to create a prototype from scratch. My demonstration: rapidly prototyping a Sketch-to-3D model generation tool.

Two main areas cause the most time lost when shipping a new product: AI integration and product deployment. This guide will be easier to use some programming knowledge, but don't leave if you're a beginner! You can use the resources below as a starting point and co-create with some LLM (Claude Sonnet is my favorite coding companion) to build out the rest of your functionality.

AI Models

Integrating AI into your application is easier than ever with API-based tools. Instead of setting up complex machine learning infrastructure, you can send a request to a server with a GPU and get results returned, whether that’s generating an image, transforming text, or creating 3D models. These services allow you to focus on building your product without worrying about model hosting or scaling.

Replicate

I’m a big fan of Replicate, a platform that provides user-uploaded AI models. Any model you’ve been reading about online is probably already available. Important for pre-raise startups and indie hackers is that you only pay for what is used, whereas other similar tools make you turn the servers on and off yourself and pay for the entire time they’re running, even if nothing is called.

Created with Flux from the prompt "photograph of a fancy wooden chair, black background"

Here are the outputs from the talk examples:

Image Generation: Created custom product visuals on demand → View example output
Image-to-3D: Transformed 2D designs into manipulable 3D models → View example output

Models on Replicate organized into collections

How to use: Visit Replicate's model collection and search models by use case. You can also look up “replicate” + the model functionality you’re looking for in the search engine of your choice. Once there, check out the model inputs and outputs and try running a test in the Replicate Playground (the main interface on the site). Once you’re sure everything is working as expected, move to the API tab, select the language you’re building in (probably Node or Python if you’re following this guide), and connect to your application.

fal

For projects with critical budget and response time, fal.ai offers an alternative to Replicate. The platform functions on the same core principle of providing API access to powerful AI models. What sets fal.ai apart is its standardized pricing structure, which typically results in lower per-call costs than Replicate. Additionally, the service generally delivers faster inference times, which becomes crucial for user-facing applications where perceived responsiveness matters.

The main limitation of fal.ai is its more curated approach to model selection. Unlike Replicate's open ecosystem, where anyone can contribute models, fal.ai only includes models added by its internal development team.

Image-to-3D on fal, generated in a few seconds

During my demonstration, I utilized fal.ai's Triposr model for image-to-3D conversion.

The integration process follows a similar pattern to Replicate: identify the appropriate model in their gallery, test it out in the Playground, review the documentation in the API tab, and implement the API calls in your application.

Hugging Face

Hugging Face offers the largest selection of AI models, spanning everything from image generation and style transfer to dataset creation and complex reasoning. It’s a go-to platform for developers looking to experiment with a wide range of open-source machine learning models.

"Use via API" link available at the bottom of Gradio spaces

One of Hugging Face’s biggest advantages is Gradio, an open-source Python library that simplifies model deployment. When developers upload models using Gradio, they automatically generate API endpoints, making integration seamless.

https://huggingface.co/spaces, where you can explore popular spaces and search for capabilities in natural language

You can explore thousands of models with API access on Hugging Face Spaces. Scroll to the bottom of any Gradio space and click "Use via API" to see connection options. For example, check out the Sketch-to-3D model I used in-talk. ‍

Pros:

Most extensive model library (350k+ models)
Free-tier access for many models
Great for niche functionality

Cons:

Less reliable API performance compared to Replicate or fal.ai
Free-tier GPUs can be slow, especially during peak usage

If you need access to a wide variety of models and are willing to deal with some trade-offs in speed and reliability, Hugging Face is a strong option.

Platform Comparison for Image Generation Inference

Platform	Pricing	Response Time	Model Availability	Customization
Replicate	Pay-per-use, ~$0.003–$0.01 per image	Fast, ~4s per image, with potential “cold-start” time	Thousands of models	Deploy custom models, fine-tuning
fal.ai	Pay-per-use, ~$0.003 per MP	Ultra-fast, sub-second latency	Curated (hundreds of models)	Fine-tuning
Hugging Face	Free (limited compute), $0.40/hr for GPU	Slower on free tier, ~4s on GPU	Largest selection, 350k+ models	Full fine-tuning + model hosting

‍

If you need speed and lower costs, fal.ai is best. If you want the largest selection of models and full fine-tuning, Hugging Face is the way to go. If you prioritize easy API integration with pay-as-you-go pricing, Replicate is the best fit.

Deployment

Deployment is the process of making an application available online so anyone can access and use it. This is trickier than it sounds, as evidenced by the famous developer phrase, “It works on my machine.” The following three tools make it ridiculously easy to deploy apps.

Gradio

Gradio excels at turning machine learning models into interactive interfaces with minimal code. This Python library serves a dual purpose: rapid prototyping with Python and straightforward deployment. Get started building with Gradio here.

Once your app is ready, Gradio offers temporary deployment through a simple parameter change for immediate testing:

demo.launch(share=True) # Creates a temporary public URL

This generates a temporary URL that is valid for 72 hours. Remember that this method relies on your computer staying powered on and connected to the Internet.

For more persistent hosting, Gradio integrates seamlessly with Hugging Face. Run the command

gradio deploy

from your Terminal. This will push the application to a Hugging Face space for anyone to access.

Streamlit

Streamlit provides similar simplicity to Gradio but adds privacy controls for your prototypes. Its cloud platform allows you to deploy data-centric applications with fine-grained access management.

Setting Up Streamlit Cloud Deployment:

Create an account: Visit https://streamlit.io/ and sign up for a free account using your email or GitHub credentials.
Prepare your repository: Make sure your Streamlit app is in a GitHub repository. Your repository should include:
- A requirements.txt file listing dependencies
- A Python file containing your Streamlit app (typically named app.py or streamlit_app.py)
Deploy your application:
- From your Streamlit dashboard, click "New app"
- Connect your GitHub account if you haven't already
- Select the repository containing your Streamlit app
- Choose the branch you want to deploy
- Specify the main Python file that runs your application
- Click "Deploy"
Manage access controls (optional):
- Once deployed, navigate to your app settings
- Under "Sharing & Security," you can set your app to:
  - Public (anyone with the link can access)
  - Private (only invited users can access)
- For private apps, add email addresses of team members or stakeholders

This deployment process takes just minutes and requires no server configuration or DevOps knowledge. Your app automatically updates whenever you push changes to the connected GitHub repository, creating a seamless development workflow.

Streamlit's privacy options make it particularly valuable for MVPs that aren't ready for public release or those requiring controlled testing with specific users, such as potential investors or beta testers. I’ve used it for applications with private data, for example.

Glitch

Glitch is perhaps the most friction-free deployment option, especially for web applications. While Streamlit and Gradio are Python-based, Glitch lets you build a React site without any setup. The platform also includes real-time collaborative coding (like Google Docs). Glitch isn't ideal for hosting ML models directly, but it's great for front-end prototypes that connect to Replicate, Hugging Face, or fal.ai APIs.

Sharing projects on Glitch, Source: https://glitch.com/editor

To leverage Glitch:

Visit glitch.com and create an account
Create a new project (React, Node.js, or other templates available)
Begin coding in the browser-based editor—changes deploy automatically
Share your unique Glitch URL with collaborators or users. Access this on the top right when you’re in your project view → Share → Live site URL

Glitch eliminates the development/deployment divide entirely. There's no separate "push to production" step—your changes are live as soon as you save them, accelerating the iteration cycle.

The platform remains free until you reach significant traffic thresholds, making it ideal for early-stage prototypes. Glitch's multiplayer coding environment is particularly valuable when working with teammates or demonstrating real-time development.

Q&A

At the end of the talk, the audience asked the following questions.

Q: How do you keep track of new things? I assume you use Glitch now, but what about next month?

You don’t need to switch tools or chase new releases constantly. I’ve been using Glitch for years, especially during workshops. The same goes for Replicate, which I use for everything. I have gradually unlocked new features, such as hosting my own models and fine-tuning image and language models.

Find a workflow and tools that work for you through exploration, double down on them, and make things faster and faster every time you build something new.

Q: What if you want to do something more specific on the model side?

Many projects never progress beyond using commercial API-based models like GPT-4 or Claude. This makes sense; these are the most popular foundation models for a reason: they’re great, generalizable, inexpensive, and reliable.

For moderate customization, Hugging Face's ecosystem supports forking existing projects to make targeted modifications. This allows you to adapt open-source models to your specific needs without starting from scratch. You can adjust parameters, fine-tune on custom data, or modify the interface while maintaining the underlying architecture.

As your requirements become more specific, you can take greater control:

Start by exploring academic papers in your domain of interest
Locate the associated model implementations (often shared on Hugging Face Models, GitHub repositories, or even Google Drive links within papers)
Deploy these models yourself on platforms like Replicate, which allows uploading custom models to their infrastructure

Q: How long will this be relevant for?

Okay, no one actually asked this, but it’s something I’ve been thinking about between giving this talk and writing this article. While I stand by my earlier point that these tools don’t "go bad," the landscape is shifting. The trend seems to be moving toward tools like Websim, which are designed to run entirely on prompts without builders coding at all. These platforms often include one-click deployments, making a large part of this article unnecessary for certain users.

That said, for developers who want more control, flexibility, or the ability to scale beyond a prototype, tools like Replicate, Hugging Face, and Streamlit may remain essential. Time will tell how it all shakes out. In the meantime, I hope this helps you prototype your dream products and ship them to users!

Connect

Visit