Y Combinator has a strong track record of funding successful startups in various industries. According to their website, 39% of YC companies have raised a Series A, and 18% of YC companies are valued at $100M+. In recent years, there has been a significant increase in AI/ML startups in YC batches, with over 40% of the latest YC batch being AI/ML startups. I did an exercise in collaboration with Valentina Escudero, to analyze YC summer 2023 batch this year and figured out some key learnings.
YC Valuations take a premium to other pre-seed/seed stage companies. Most companies were raising $1.5-2m on $15m-$20m (one outlier at $40m) post. According to state of the markets data, the median seed post money valuation is $12.5m for H1’23, up from $10.7m last year. There was a strong indication that AI will be affecting all the avenues of the technical stack, from user interface/experience to hardware. YC had relatively younger founders this year, with many co-founders in the middle of their undergrad or dropping out. General Catalyst and Pioneer fund invested in a large number of these startups. GC funded top 25 of YC batch with $100k uncapped SAFEs.
Here are some of the most emerging areas which we witnessed:
YC Summer 2023 Landscape
Total Companies: 270
Total AI Companies: 67
Applied AI Companies: 160
Important Verticals (AI & AI Powered):
We are seeing a swarm of vertical specific AI agents, built using existing open-sourced or third-party models. There were 13+ startups from this batch focusing on co-pilot version or agent version to solve a specific problem in verticals like FinTech, Retail, Health-Tech and Education. Another thriving space within the AI agent category was RPA, where companies are flocking to solve complex enterprise level problems using LLMs by simplifying the workflows of business users and scaling their productivity with 9+ companies in this space. Interestingly, a few companies were not even using LLMs and were just trying to solve the problems using simple solutions. The company I liked the most was an Agent Infrastructure company called Reworkd, that pivoted to the RPA domain as well. Reworked focuses on automating core business workflows with the help of AI agents. The company's platform addresses the limitations of AI agents in handling business-critical processes by coupling AI agents with a structured workflow system. The company gained traction through their OS GH Repo: AgentGPT.
Key takeaway:RPA is big theme for next generation AI Agents. But how can we make them better? That’s a question that remained unanswered.
Gen AI was a trending theme in the current YC batch with around 40 YC companies. The general trend was to enable search capabilities across various verticals, that included Medical Research. E-Commerce, Documents and other specific verticals. There were also companies in this segment that focused on providing designing solutions and support to different verticals like firmware designing, application user interface designing, social media video generation among other interesting areas. One of the interesting GenAI companies was Tempo Labs, which has developed an AI design tool, which allows users to generate and edit high-quality React code using natural language prompts and a visual code editor.
Key takeaway: Vertical specific search startups are still hot, as none of the solutions are well positioned in the market to take advantage of LLMs and provide a comprehensive solution in specific business verticals ex. e-commerce, document, image search, etc.
Software 2.0 is a term that refers to the paradigm shift in software development driven by machine learning, particularly deep learning and neural networks. This space is still hot with Meta launching Code Llama2 and Stability AI releasing Stable Code. We saw a total of 10 companies, building in this space for solving different specific problems from code generation for UI to a few companies focusing on automating software chores like bug fixes and minor feature requests, etc.
Key takeaway: There is a need for comprehensive solution for code generation tools beyond what open source can provide with more capabilities like personalization, regression testing, CI, bug fixes, explainability, etc. with a more holistic embedding into normal development lifecycle, instead of bombarding devs with thousands of new tools. We still think nothing matches co-pilot performance.
AI is transforming the legal tech industry by automating mundane and mechanical tasks, allowing legal professionals to focus on higher-level tasks such as negotiating deals, advising clients, and appearing in courts. Evisort and Definely are using AI to improve various aspects of legal work, such as contract management and document review. Pincites is an AI solution that automatically reviews and redlines contracts using a company's preferred guidance, right in Microsoft Word. Atla is a legal tech company that makes it easy for in-house lawyers to answer legal questions for their business by summarizing thousands of pages of legalese.
Key takeaway: Focus on strong markets like patents, AI assistants for the lawyers along with contract generation independent of verticals, instead of looking at very focused solutions without comprehensive solution for a specific legal segment. An important factor to consider is looking at companies more focused on getting legal firm customers that are compensated based on the number of deals they do as oppose to billable hours, making it easier to acquire customers.
This space mainly saw a combination of content moderation workflow co-pilots with varying degrees of human-in-the-loop engagement (Sero AI, SafetyKit) as well as data analysis tools built for enterprise (meaning, anonymized PII) e.g. Vizly and Kobalt Labs. Kobalt Labs additionally protects against prompt injections and provides a slightly more robust solution. Langdock was a chatGPT for enterprise solution and GDPR compliant.
Key takeaway: The defensibility at this level for startups is still unclear. More interesting developments will probably be at the model layer with regards to data governance, lineage, and provenance, especially as OSS models become more prevalent within enterprises.
The latest developments in document engineering using AI involve techniques such as Retrieval Augmented Generation (RAG), embedding models, and vector search to process, analyze, and extract information from various types of documents. Latentspace, Watto AI and ztool were the ones that stood out, with their target customers ranging from low-code (PM and analyst integrated workflow) to no-code (creating apps by describing them). Most companies are using RAG and existing open-source models to solve a flavor of problems specific to document engineering space.
Key takeaway: Too much duplication and lack of novelty in this area. What is needed: An on-prem, long form document engineering solution that does not stop on analysis but makes generation a seamless experience.
Model Training & Inference
In this batch, there were 6 startups focusing on training, inference, and RLHF. There is broadly a moat issue, as most are focused on general LLM optimizations and infra orchestration. Companies like CambioML are narrowing in on RLHF-based improvements. The space is split here with some companies providing OSS solutions, like Cedana and CambioML as they look to build community and think about how to monetize in the future. Automorphic, Trainy and Chatter offer solutions for model protection and observability pre-deployment, with Automorphic adding a layer of few-shot learning to tune your model at runtime while they orchestrate the technique. Another startup Taylor AI seems to be targeting “fine-tuning as a service and orchestration play for OSS models.
Key takeaway: Most of these solutions are focused on “fine-tuning as a service” primarily driven by optimizations in identification, observability, debugging, and prompt testing along with broadly orchestrating the best models/compute combinations – making defensibility difficult. CambioML stands out given their approach to RLHF finetuning on curated data.
1. Cedalio is a groundbreaking Web3 database platform that empowers developers to harness the power of structured data at scale while maintaining user data ownership as a top priority. It provides an alternative to traditional cloud storage solutions by offering built-in web3 features that guarantee transparency, security, and verifiability.
2. Giga ML is a company that helps enterprises train large language models (LLMs) for their specific needs. They have developed a platform that can make custom LLMs perform as well as ChatGPT-4, and their open-source benchmarks outperform all other models, including Claude 2.
3. Martin AI is a better Siri with an LLM brain that builds a personal relationship and understands what matters most. Martin can be used to search the web in natural conversation, plan out your week, set reminders, or chat about a movie just watched.
4. AxFlow (Previously known as Axilla) is an end-to-end framework for enterprises to develop AI applications using TypeScript, providing an opinionated toolkit to orchestrate, monitor, and continuously improve AI applications in production.
5. CambioML bridges the gap between RLHF finetuning and data curation: a unified interface with interactive user interface, compute environment, and auto-structured data.
Conclusion: In general, we saw predictable trends in AI in this YC batch and look forward to more comprehensive solutions. We are thrilled by the depth of talent this batch carries and milestones these amazing founders are willing to achieve.
Authors: Hina Dixit & Valentina Escudero
Note & Disclaimer: This article does not reflect opinion of our employers and is our personal perspective and reflection. Also, we have shared our observations based on public information provided on the YCombinator website about the YC 23 batch and we don't intend to make any predictions about the upcoming markets through this analysis. We do not imply or suggest any of these startups for investments or endorsements either.