• Tech Book of the Month
  • Archive
  • Recommend a Book
  • Choose The Next Book
  • Sign Up
  • About
  • Search
Tech Book of the Month
  • Tech Book of the Month
  • Archive
  • Recommend a Book
  • Choose The Next Book
  • Sign Up
  • About
  • Search

May 2021 - Crossing the Chasm by Geoffrey Moore

This month we take a look at a classic high-tech growth marketing book. Originally published in 1991, Crossing the Chasm became a beloved book within the tech industry although its glory seems to have faded over the years. While the book is often overly prescriptive in its suggestions, it provides several useful frameworks to address growth challenges primarily early on in a company’s history.

Tech Themes

  1. Technology Adoption Life Cycle. The core framework of the book discusses the evolution of new technology adoption. It was an interesting micro-view of the broader phenomena described in Carlota Perez’s Technological Revolutions. In Moore’s Chasm-crossing world, there are five personas that dominate adoption: innovators, early adopters, early majority, late majority, and laggards. Innovators are technologists, happy to accept more challenging user experiences to push the boundaries of their capabilities and knowledge. Early adopters are intuitive buyers that enjoy trying new technologies but want a slightly better experience. The early majority are “wait and see” folks that want others to battle test the technology before trying it out, but don’t typically wait too long before buying. The late majority want significant reference material and usage before buying a product. Laggards simply don’t want anything to do with new technology. It is interesting to think of this adoption pattern in concert with big technology migrations of the past twenty years including: mainframes to on-premise servers to cloud computing, home phones to cell phones to iphone/android, radio to CDs to downloadable music to Spotify, and cash to check to credit/debit to mobile payments. Each of these massive migration patterns feels very aligned with this adoption model. Everyone knows someone ready to apply the latest tech, and someone who doesn’t want anything to do with it (Warren Buffett!).

  2. Crossing the Chasm. If we accept the above as a general way products are adopted by society (obviously its much more of a mish/mash in reality), we can posit that the most important step is from the early adopters to the early majority - the spot where the bell curve (shown below) really opens up. This is what Geoffrey Moore calls Crossing the Chasm. This idea is highly reminiscent of Clay Christensen’s “not good enough” disruption pattern and Gartner’s technology hype cycle. The examples Moore uses (in 1991) are also striking: Neural networking software and desktop video conferencing. Moore lamented: “With each of these exciting, functional technologies it has been possible to establish a working system and to get innovators to adopt it. But it has not as yet been possible to carry that success over to the early adopters.” Both of these technologies have clearly crossed into the mainstream with Google’s TensorFlow machine learning library and video conferencing tools like Zoom that make it super easy to speak with anyone over video instantly. So what was the great unlock for these technologies, that made these commercially viable and successfully adopted products? Well since 1990 there have been major changes in several important underlying technologies - computer storage and data processing capabilities are almost limitless with cloud computing, network bandwidth has grown exponentially and costs have dropped, and software has greatly improved the ability to make great user experiences for customers. This is a version of not-good-enough technologies that have benefited substantially from changes in underlying inputs. The systems you could deploy in 1990 just could not have been comparable to what you can deploy today. The real question is - are there different types of adoption curves for differently technologies and do they really follow a normal distribution as Moore shows here?

  3. Making Markets & Product Alternatives. Moore positions the book as if you were a marketing executive at a high-tech company and offers several exercises to help you identify a target market, customer, and use case. Chapter six, “Define the Battle” covers the best way to position a product within a target market. For early markets, competition comes from non-consumption, and the company has to offer a “Whole Product” that enables the user to actually derive benefit from the product. Thus, Moore recommends targeting innovators and early adopters who are technologist visionaries able to see the benefit of the product. This also mirrors Clayton Christensen’s commoditization de-commoditization framework, where new market products must offer all of the core components to a system combined into one solution; over time the axis of commoditization shifts toward the underlying components as companies differentiate by using faster and better sub-components. Positioning in these market scenarios should be focused on the contrast between your product and legacy ways of performing the task (use our software instead of pen and paper as an example). In mainstream markets, companies should position their products within the established buying criteria developed by pragmatist buyers. A market alternative serves as the incumbent, well-known provider and a product alternative is a near upstart competitor that you are clearly beating. What’s odd here is that you are constantly referring to your competitors as alternatives to your product, which seems counter-intuitive but obviously, enterprise buyers have alternatives they are considering and you need to make the case that your solution is the best. Choosing a market alternative lets you procure a budget previously used for a similar solution, and the product alternative can help differentiate your technology relative to other upstarts. Moore’s simple positioning formula has helped hundreds of companies establish their go-to-market message: “For (target customers—beachhead segment only) • Who are dissatisfied with (the current market alternative) • Our product is a (new product category) • That provides (key problem-solving capability). • Unlike (the product alternative), • We have assembled (key whole product features for your specific application).”

Business Themes

0_KIXz2tAVqXVREkyd.png
Whole-Product-5-PRODUCT-LEVELS-PHILIP-KOTLER.png
Zz0xZTMzMGUxNGRlNWQxMWVhYTYyMTBhMTMzNTllZGE5ZA==.png
  1. What happened to these examples? Moore offers a number of examples of Crossing the Chasm, but what actually happened to these companies after this book was written? Clarify Software was bought in October 1999 by Nortel for $2.1B (a 16x revenue multiple) and then divested by Nortel to Amdocs in October 2001 for $200M - an epic disaster of capital allocation. Documentum was acquired by EMC in 2003 for $1.7B in stock and was later sold to OpenText in 2017 for $1.6B. 3Com Palm Pilot was a mess of acquisitions/divestitures. Palm was acquired by U.S Robotics which was acquired by 3COM in 1997 and then subsequently spun out in a 2000 IPO which saw a 94% drop. Palm stopped making PDA devices in 2008 and in 2010, HP acquired Palm for $1.2B in cash. Smartcard maker Gemplus merged with competitor Axalto in an 1.8Bn euro deal in 2005, creating Gemalto, which was later acquired by Thales in 2019 for $8.4Bn. So my three questions are: Did these companies really cross the chasm or were they just readily available success stories of their time? Do you need to be the company that leads the chasm crossing or can someone else do it to your benefit? What is the next step in the chasm journey after its crossed and why did so many of these companies fail after a time?

  2. Whole Products. Moore leans into an idea called the Whole Product Concept which was popularized by Theodore Levitt’s 1983 book The Marketing Imagination and Bill Davidow’s (of early VC Mohr Davidow) 1986 book Marketing High Technology. Moore explains the idea: “The concept is very straightforward: There is a gap between the marketing promise made to the customer—the compelling value proposition—and the ability of the shipped product to fulfill that promise. For that gap to be overcome, the product must be augmented by a variety of services and ancillary products to become the whole product.” There are four different perceptions of the product: “1. Generic product: This is what is shipped in the box and what is covered by the purchasing contract. 2.Expected product: This is the product that the consumer thought she was buying when she bought the generic product. It is the minimum configuration of products and services necessary to have any chance of achieving the buying objective. For example, people who are buying personal computers for the first time expect to get a monitor with their purchase-how else could you use the computer?—but in fact, in most cases, it is not part of the generic product. 3.Augmented product: This is the product fleshed out to provide the maximum chance of achieving the buying objective. In the case of a personal computer, this would include a variety of products, such as software, a hard disk drive, and a printer, as well as a variety of services, such as a customer hotline, advanced training, and readily accessible service centers. 4. Potential product: This represents the product’s room for growth as more and more ancillary products come on the market and as customer-specific enhancements to the system are made. These are the product features that have maybe expected or additional to drive adoption.” Moore makes a subtle point that after a while, investments in the generic/out-of-the-box product functionality drive less and less purchase behavior, in tandem with broader market adoption. Customers want to be wooed by the latest technology and as products become similar, customers care less about what’s in the product today, and more about what’s coming. Moore emphasizes Whole Product Planning where you can see how you get to those additional features into the product over time - but Moore was also operating in an era when product decisions and development processes were on two-year+ timelines and not in the DevOps era of today, where product updates are pushed daily in some cases. In the bottoms-up/DevOps era, its become clear that finding your niche users, driving strong adoption from them, and integrating feature ideas from them as soon as possible can yield a big success.

  3. Distribution Channels. Moore focuses on each of the potential ways a company can distribute its solutions: Direct Sales, two-tier retail, one-tier retail, internet retail, two-tier value-added reselling, national roll-ups, original equipment manufacturers (OEMs), and system integrators. As Moore puts it, “The number-one corporate objective, when crossing the chasm, is to secure a channel into the mainstream market with which the pragmatist customer will be comfortable.” These distribution types are clearly relics of technology distribution in the early 1990s. Great direct sales have produced some of the best and biggest technology companies of yesterday including IBM, Oracle, CA Technologies, SAP, and HP. What’s so fascinating about this framework is that you just need one channel to reach the pragmatist customer and in the last 10 years, that channel has become the internet for many technology products. Moore even recognizes that direct sales had produced poor customer alignment: “First, wherever vendors have been able to achieve lock-in with customers through proprietary technology, there has been the temptation to exploit the relationship through unfairly expensive maintenance agreements [Oracle did this big time] topped by charging for some new releases as if they were new products. This was one of the main forces behind the open systems rebellion that undermined so many vendors’ account control—which, in turn, decrease predictability of revenues, putting the system further in jeopardy.” So what is the strategy used by popular open-source bottoms up go-to-market motions at companies like Github, Hashicorp, Redis, Confluent and others? Its straightforward - the internet and simple APIs (normally on Github) provide the fastest channel to reach the developer end market while they are coding. When you look at Open Source scaling, it can take years and years to Cross the Chasm because most of these early open source adopters are technology innovators, however, eventually, solutions permeate into massive enterprises and make the jump. With these new go-to-market motions coming on board, driven by the internet, we’ve seen large companies grow from primarily inbound marketing tactics and less direct outbound sales. The companies named above as well as Shopify, Twilio, Monday.com and others have done a great job growing to a massive scale on the backs of their products (product-led growth) instead of a salesforce. What’s important to realize is that distribution is an abstract term and no single motion or strategy is right for every company. The next distribution channel will surprise everyone!

Dig Deeper

  • How the sales team behind Monday is changing the way workplaces collaborate

  • An Overview of the Technology Adoption Lifecycle

  • A Brief History of the Cloud at NDC Conference

  • Frank Slootman (Snowflake) and Geoffrey Moore Discuss Disruptive Innovations and the Future of Tech

  • Growth, Sales, and a New Era of B2B by Martin Casado (GP at Andreessen Horowitz)

  • Strata 2014: Geoffrey Moore, "Crossing the Chasm: What's New, What's Not"

tags: Crossing the Chasm, Github, Hashicorp, Redis, Monday.com, Confluent, Open Source, Snowflake, Shopify, Twilio, Geoffrey Moore, Gartner, TensorFlow, Google, Clayton Christensen, Zoom, nORTEL, Amdocs, OpenText, EMC, HP, CA, IBM, Oracle, SAP, Gemalto, DevOps
categories: Non-Fiction
 

February 2021 - Rise of the Data Cloud by Frank Slootman and Steve Hamm

This month we read a new book by the CEO of Snowflake and author of our November 2020 book, Tape Sucks. The book covers Snowflake’s founding, products, strategy, industry specific solutions and partnerships. Although the content is somewhat interesting, it reads more like a marketing book than an actually useful guide to cloud data warehousing. Nonetheless, its a solid quick read on the state of the data infrastructure ecosystem.

Tech Themes

  1. The Data Warehouse. A data warehouse is a type of database that is optimized for analytics. These optimizations mainly revolve around complex query performance, the ability to handle multiple data types, the ability to integrate data from different applications, and the ability to run fast queries across large data sets. In contrast to a normal database (like Postgres), a data warehouse is purpose-built for efficient retrieval of large data sets and not high performance read/write transactions like a typical relational database. The industry began in the late 1970s and early 80’s, driven by work done by the “Father of Data Warehousing” Bill Inmon and early competitor Ralph Kimball, who was a former Xerox PARC designer. In 1986, Kimball launched Redbrick Systems and Inmon launched Prism Solutions in 1991, with its leading product the Prism Warehouse Manager. Prism went public in 1995 and was acquired by Ardent Software in 1998 for $42M while Red Brick was acquired by Informix for ~$35M in 1998. In the background, a company called Teradata, which was formed in the late 1970s by researchers at Cal and employees from Citibank, was going through their own journey to the data warehouse. Teradata would IPO in 1987, get acquired by NCR in 1991; NCR itself would get acquired by AT&T in 1991; NCR would then spin out of AT&T in 1997, and Teradata would spin out of NCR through IPO in 2007. What a whirlwind of corporate acquisitions! Around that time, other new data warehouses were popping up on the scene including Netezza (launched in 1999) and Vertica (2005). Netezza, Vertica, and Teradata were great solutions but they were physical hardware that ran a highly efficient data warehouse on-premise. The issue was, as data began to grow on the hardware, it became really difficult to add more hardware boxes and to know how to manage queries optimally across the disparate hardware. Snowflake wanted to leverage the unlimited storage and computing power of the cloud to allow for infinitely scalable data warehouses. This was an absolute game-changer as early customer Accordant Media described, “In the first five minutes, I was sold. Cloud-based. Storage separate from compute. Virtual warehouses that can go up and down. I said, ‘That’s what we want!’”

  2. Storage + Compute. Snowflake was launched in 2012 by Benoit Dageville (Oracle), Thierry Cruanes (Oracle) and Marcin Żukowski (Vectorwise). Mike Speiser and Sutter Hill Ventures provided the initial capital to fund the formation of the company. After numerous whiteboarding sessions, the technical founders decided to try something crazy, separating data storage from compute (processing power). This allowed Snowflake’s product to scale the storage (i.e. add more boxes) and put tons of computing power behind very complex queries. What may have been limited by Vertica hardware, was now possible with Snowflake. At this point, the cloud had only been around for about 5 years and unlike today, there were only a few services offered by the main providers. The team took a huge risk to 1) bet on the long-term success of the public cloud providers and 2) try something that had never successfully been accomplished before. When they got it to work, it felt like magic. “One of the early customers was using a $20 million system to do behavioral analysis of online advertising results. Typically, one big analytics job would take about thirty days to complete. When they tried the same job on an early version of Snowflake;’s data warehouse, it took just six minutes. After Mike learned about this, he said to himself: ‘Holy shit, we need to hire a lot of sales people. This product will sell itself.’” This idea was so crazy that not even Amazon (where Snowflake runs) thought of unbundling storage and compute when they built their cloud-native data warehouse, Redshift, in 2013. Funny enough, Amazon also sought to attract people away from Oracle, hence the name Red-Shift. It would take Amazon almost seven years to re-design their data warehouse to separate storage and compute in Redshift RA3 which launched in 2019. On top of these functional benefits, there is a massive gap in the cost of storage and the cost of compute and separating the two made Snowflake a significantly more cost-competitive solution than traditional hardware systems.

  3. The Battle for Data Pipelines. A typical data pipeline (shown below) consists of pulling data from many sources, perform ETL/ELT (extract, load, transform and vice versa), centralizing it in a data warehouse or data lake, and connecting that data to visualization tools like Tableau or Looker. All parts of this data stack are facing intense competition. On the ETL/ELT side, you have companies like Fivetran and Matillion and on the data warehouse/data lake side you have Snowflake and Databricks. Fivetran focuses on the extract and load portion of ETL, providing a data integration tool that allows you to connect to all of your operational systems (salesforce, zendesk, workday, etc.) and pull them all together in Snowflake for comprehensive analysis. Matillion is similar, except it connects to your systems and imports raw data into Snowflake, and then transforms it (checking for NULL’s, ensuring matching records, removing blanks) in your Snowflake data warehouse. Matillion thus focuses on the load and transform steps in ETL while Fivetran focuses on the extract and load portions and leverages dbt (data build tool) to do transformations. The data warehouse vs. data lake debate is a complex and highly technical discussion but it mainly comes down to Databricks vs. Snowflake. Databricks is primarily a Machine Learning platform that allows you to run Apache Spark (an open-source ML framework) at scale. Databricks’s main product, Delta Lake allows you to store all data types - structured and unstructured for real-time and complex analytical processes. As Datagrom points out here, the platforms come down to three differences: data structure, data ownership, and use case versatility. Snowflake requires structured or semi-structured data prior to running a query while Databricks does not. Similarly, while Snowflake decouples data storage from compute, it does not decouple data ownership meaning Snowflake maintains all of your data, whereas you can run Databricks on top of any data source you have whether it be on-premise or in the cloud. Lastly, Databricks acts more as a processing layer (able to function in code like python as well as SQL) while Snowflake acts as a query and storage layer (mainly driven by SQL). Snowflake performs best with business intelligence querying while Databricks performs best with data science and machine learning. Both platforms can be used by the same organizations and I expect both to be massive companies (Databricks recently raised at a $28B valuation!). All of these tools are blending together and competing against each other - Databricks just launched a new LakeHouse (Data lake + data warehouse - I know the name is hilarious) and Snowflake is leaning heavily into its data lake. We will see who wins!

An interesting data platform battle is brewing that will play out over the next 5-10 years: The Data Warehouse vs the Data Lakehouse, and the race to create the data cloud

Who's the biggest threat to @snowflake? I think it's @databricks, not AWS Redshifthttps://t.co/R2b77XPXB7

— Jamin Ball (@jaminball) January 26, 2021

Business Themes

Lakehouse_v1.png
architecture-overview.png
  1. Marketing Customers. This book at its core, is a marketing document. Sure, it gives a nice story of how the company was built, the insights of its founding team, and some obstacles they overcame. But the majority of the book is just a “Imagine what you could do with data” exploration across a variety of industries and use cases. Its not good or bad, but its an interesting way of marketing - that’s for sure. Its annoying they spent so little on the technology and actual company building. Our May 2019 book, The Everything Store, about Jeff Bezos and Amazon was perfect because it covered all of the decision making and challenging moments to build a long-term company. This book just talks about customer and partner use cases over and over. Slootman’s section is only about 20 pages and five of them cover case studies from Square, Walmart, Capital One, Fair, and Blackboard. I suspect it may be due to the controversial ousting of their long-time CEO Bob Muglia for Frank Slootman, co-author of this book. As this Forbes article noted: “Just one problem: No one told Muglia until the day the company announced the coup. Speaking publicly about his departure for the first time, Muglia tells Forbes that it took him months to get over the shock.” One day we will hear the actual unfiltered story of Snowflake and it will make for an interesting comparison to this book.

  2. Timing & Building. We often forget how important timing is in startups. Being the right investor or company at the right time can do a lot to drive unbelievable returns. Consider Don Valentine at Sequoia in the early 1970’s. We know that venture capital fund performance persists, in part due to incredible branding at firms like Sequoia that has built up over years and years (obviously reinforced by top-notch talents like Mike Moritz and Doug Leone). Don is a great investor and took significant risks on unproven individuals like Steve Jobs (Apple), Nolan Bushnell (Atari), and Trip Hawkins (EA). But he also had unfettered access to the birth of an entirely new ecosystem and knowledge of how that ecosystem would change business, built up from his years at Fairchild Semiconductor. Don is a unique person and capitalized on that incredible knowledgebase, veritably creating the VC industry. Sequoia is a top firm because he was in the right place at the right time with the right knowledge. Now let’s cover some companies that weren’t: Cloudera, Hortonworks, and MapR. In 2005, Yahoo engineers Doug Cutting and Mike Cafarella, inspired by the Google File System paper, created Hadoop, a distributed file system for storing and accessing data like never before. Hadoop spawned many companies like Cloudera, Hortonworks, and MapR that were built to commercialize the open-source Hadoop project. All of the companies came out of the gate fast with big funding - Cloudera raised $1B at a $4B valuation prior to its 2017 IPO, Hortonworks raised $260M at a $1B valuation prior to its 2014 IPO, and MapR $300M before it was acquired by HPE in 2019. The companies all had one thing in problem however, they were on-premise and built prior to the cloud gaining traction. That meant it required significant internal expertise and resources to run Cloudera, Hortonworks, and MapR software. In 2018, Cloudera and Hortonworks merged (at a $5B valuation) because the competitive pressure from the cloud was eroding both of their businesses. MapR was quietly acquired for less than it raised. Today Cloudera trades at a $5B valuation meaning no shareholder return since the merger and the business is only recently slightly profitable at its current low growth rate. This cautionary case study shows how important timing is and how difficult it is to build a lasting company in the data infrastructure world. As the new analytics stack is built with Fivetran, Matillion, dbt, Snowflake, and Databricks, it will be interesting to see which companies exist 10 years from now. Its probable that some new technology will come along and hurt every company in the stack, but for now the coast is clear - the scariest time for any of these companies.

  3. Burn Baby Burn. Snowflake burns A LOT of money. In the Nine months ended October 31, 2020, Snowflake burned $343M, including $169M in their third quarter alone. Why would Snowflake burn so much money? Because they are growing efficiently! What does efficient growth mean? As we discussed in the last Frank Slootman book - sales and marketing efficiency is a key hallmark to understand the quality of growth a company is experiencing. According to their filings, Snowflake added ~$230M of revenue and spent $325M in sales and marketing. This is actually not terribly efficient - it supposes a dollar invested in sales and marketing yielded $0.70 of incremental revenue. While you would like this number to be closer to 1x (i.e. $1 in S&M yield $1 in revenue - hence a repeatable go-to-market motion), it is not terrible. ServiceNow (Slootman’s old company), actually operates less efficiently - for every dollar it invests in sales and marketing, it generates only $0.55 of subscription revenue. Crowdstrike, on the other hand, operates a partner-driven go-to-market, which enables it to generate more while spending less - created $0.90 for every dollar invested in sales and marketing over the last nine months. However, there is a key thing that distinguishes the data warehouse compared to these other companies and Ben Thompson at Stratechery nails it here: “Think about this in the context of Snowflake’s business: the entire concept of a data warehouse is that it contains nearly all of a company’s data, which (1) it has to be sold to the highest levels of the company, because you will only get the full benefit if everyone in the company is contributing their data and (2) once the data is in the data warehouse it will be exceptionally difficult and expensive to move it somewhere else. Both of these suggest that Snowflake should spend more on sales and marketing, not less. Selling to the executive suite is inherently more expensive than a bottoms-up approach. Data warehouses have inherently large lifetime values given the fact that the data, once imported, isn’t going anywhere.” I hope Snowflake burns more money in the future, and builds a sustainable long-term business.

Dig Deeper

  • Early Youtube Videos Describing Snowflake’s Architecture and Re-inventing the Data Warehouse

  • NCR’s spinoff of Teradata in 2007

  • Fraser Harris of Fivetran and Tristan Handy of dbt speak at the Modern Data Stack Conference

  • Don Valentine, Sequoia Capital: "Target Big Markets" - A discussion at Stanford

  • The Mike Speiser Incubation Playbook (an essay by Kevin Kwok)

tags: Snowflake, Data Warehouse, Oracle, Vertica, Netezza, IBM, Databricks, Apache Spark, Open Source, Fivetran, Matillion, dbt, Data Lake, Sequoia, ServiceNow, Crowdstrike, Cloudera, Hortonworks, MapR, BigQuery, Frank Slootman, Teradata, Xerox, Informix, NCR, AT&T, Benoit Dageville, Mike Speiser, Sutter Hill Ventures, Redshift, Amazon, ETL, Hadoop, SQL
categories: Non-Fiction
 

October 2020 - Working in Public: The Making and Maintenance of Open Source Software by Nadia Eghbal

This month we covered Nadia Eghbal’s instant classic about open-source software. Open-source software has been around since the late seventies but only recently it has gained significant public and business attention.

Tech Themes

The four types of open source communities described in Working in Public

The four types of open source communities described in Working in Public

  1. Misunderstood Communities. Open source is frequently viewed as an overwhelmingly positive force for good - taking software and making it free for everyone to use. Many think of open source as community-driven, where everyone participates and contributes to making the software better. The theory is that so many eyeballs and contributors to the software improves security, improves reliability, and increases distribution. In reality, open-source communities take the shape of the “90-9-1” rule and act more like social media than you could think. According to Wikipedia, the "90–9–1” rule states that for websites where users can both create and edit content, 1% of people create content, 9% edit or modify that content, and 90% view the content without contributing. To show how this applies to open source communities, Eghbal cites a study by North Carolina State Researchers: “One study found that in more than 85% of open source projects the research examined on Github, less than 5% of developers were responsible for 95% of code and social interactions.” These creators, contributors, and maintainers are developer influencers: “Each of these developers commands a large audience of people who follow them personally; they have the attention of thousands of developers.” Unlike Instagram and Twitch influencers, who often actively try to build their audiences, open-source developer influencers sometimes find the attention off-putting - they simply published something to help others and suddenly found themselves with actual influence. The challenging truth of open source is that core contributors and maintainers give significant amounts of their time and attention to their communities - often spending hours at a time responding to pull requests (requests for changes / new features) on Github. Evan Czaplicki’s insightful talk entitled “The Hard Parts of Open Source,” speaks to this challenging dynamic. Evan created the open-source project, Elm, a functional programming language that compiles Javascript, because he wanted to make functional programming more accessible to developers. As one of its core maintainers, he has repeatedly been hit with requests of “Why don’t you just…” from non-contributing developers angrily asking why a feature wasn’t included in the latest release. As fastlane creator, Felix Krause put it, “The bigger your project becomes, the harder it is to keep the innovation you had in the beginning of your project. Suddenly you have to consider hundreds of different use cases…Once you pass a few thousand active users, you’ll notice that helping your users takes more time than actually working on your project. People submit all kinds of issues, most of them aren’t actually issues, but feature requests or questions.” When you use open-source software, remember who is contributing and maintaining it - and the days and years poured into the project for the sole goal of increasing its utility for the masses.

  2. Git it? Git was created by Linus Torvalds in 2005. We talked about Torvalds last month, who also created the most famous open-source operating system, Linux. Git was born in response to a skirmish with Larry McAvoy, the head of proprietary tool BitKeeper, over the potential misuse of his product. Torvalds went on vacation for a week and hammered out the most dominant version control system today - git. Version control systems allow developers to work simultaneously on projects, committing any changes to a centralized branch of code. It also allows for any changes to be rolled back to earlier versions which can be enormously helpful if a bug is found in the main branch. Git ushered in a new wave of version control, but the open-source version was somewhat difficult to use for the untrained developer. Enter Github and GitLab - two companies built around the idea of making the git version control system easier for developers to use. Github came first, in 2007, offering a platform to host and share projects. The Github platform was free, but not open source - developers couldn’t build onto their hosting platform - only use it. GitLab started in 2014 to offer an alternative, fully-open sourced platform that allowed individuals to self-host a Github-like tracking program, providing improved security and control. Because of Github’s first mover advantage, however, it has become the dominant platform upon which developers build: “Github is still by far the dominant market player: while it’s hard to find public numbers on GitLab’s adoption, its website claims more than 100,000 organizations use its product, whereas GitHub claims more than 2.9 million organizations.” Developers find GitHub incredibly easy to use, creating an enormous wave of open source projects and code-sharing. The company added 10 million new users in 2019 alone - bringing the total to over 40 million worldwide. This growth prompted Microsoft to buy GitHub in 2018 for $7.5B. We are in the early stages of this development explosion, and it will be interesting to see how increased code accessibility changes the world over the next ten years.

  3. Developing and Maintaining an Ecosystem Forever. Open source communities are unique and complex - with different user and contributor dynamics. Eghbal tries to segment the different types of open source communities into four buckets - federations, clubs, stadiums, and toys - characterized below in the two by two matrix - based on contributor growth and user growth. Federations are the pinnacle of open source software development - many contributors and many users, creating a vibrant ecosystem of innovative development. Clubs represent more niche and focused communities, including vertical-specific tools like astronomy package, Astropy. Stadiums are highly centralized but large communities - this typically means only a few contributors but a significant user base. It is up to these core contributors to lead the ecosystem as opposed to decentralized federations that have so many contributors they can go in all directions. Lastly, there are toys, which have low user growth and low contributor growth but may actually be very useful projects. Interestingly, projects can shift in and out of these community types as they become more or less relevant. For example, developers from Yahoo open-sourced their Hadoop project based on Google’s File System and Map Reduce papers. The initial project slowly became huge, moving from a stadium to a federation, and formed subprojects around it, like Apache Spark. What’s interesting, is that projects mature and change, and code can remain in production for a number of years after the project’s day in the spotlight is gone. According to Eghbal, “Some of the oldest code ever written is still running in production today. Fortran, which was first developed in 1957 at IBM, is still widely used in aerospace, weather forecasting, and other computational industries.” These ecosystems can exist forever, but the costs of these ecosystems (creation, distribution, and maintenance) are often hidden, especially the maintenance aspect. The cost of creation and distribution has dropped significantly in the past ten years - with many of the world’s developers all working in the same ecosystem on GitHub - but it has also increased the total cost of maintenance, and that maintenance cost can be significant. Bootstrap co-creator Jacob Thornton likens maintenance costs to caring for an old dog: “I’ve created endlessly more and more projects that have now turned [from puppies] into dogs. Almost every project I release will get 2,000, 3,000 watchers, which is enough to have this guilt, which is essentially like ‘I need to maintain this, I need to take care of this dog.” Communities change from toys to clubs to stadiums to federations but they may also change back as new tools are developed. Old projects still need to be maintained and that code and maintenance comes down to committed developers.

Business Themes

1_c7udbm7fJtdkZEE6tl1mWQ.png
  1. Revenue Model Matching. One of the earliest code-hosting platforms was SourceForge, a company founded in 1999. The Company pioneered the idea of code-hosting - letting developers publish their code for easy download. It became famous for letting open-source developers use the platform free of charge. SourceForge was created by VA Software, an internet bubble darling that saw its stock price decimated when the bubble finally burst. The challenge with scaling SourceForge was a revenue model mismatch - VA Software made money with paid advertising, which allowed it to offer its tools to developers for free, but meant its revenue model was highly variable. When the company went public, it was still a small and unproven business, posting $17M in revenue and $31M in costs. The revenue model mismatch is starting to rear its head again, with traditional software as a service (SaaS) recurring subscription models catching some heat. Many cloud service and API companies are pricing by usage rather than a fixed, high margin subscription fee. This is the classic electric utility model - you only pay for what you use. Snowflake CEO Frank Slootman (who formerly ran SaaS pioneer ServiceNow) commented: “I also did not like SaaS that much as a business model, felt it not equitable for customers.” Snowflake instead charges based on credits which pay for usage. The issue with usage-based billing has traditionally been price transparency, which can be obfuscated with customer credit systems and incalculable pricing, like Amazon Web Services. This revenue model mismatch was just one problem for SourceForge. As git became the dominant version control system, SourceForge was reluctant to support it - opting for its traditional tools instead. Pricing norms change, and new technology comes out every day, it’s imperative that businesses have a strong grasp of the value they provide to their customers and align their revenue model with customers, so a fair trade-off is created.

  2. Open Core Model. There has been enormous growth in open source businesses in the past few years, which typically operate on an open core model. The open core model means the Company offers a free, normally feature limited, version of its software and also a proprietary, enterprise version with additional features. Developers might adopt the free version but hit usage limits or feature constraints, causing them to purchase the paid version. The open-source “core” is often just that - freely available for anyone to download and modify; the core's actual source code is normally published on GitHub, and developers can fork the project or do whatever they wish with that open core. The commercial product is normally closed source and not available for modification, providing the business a product. Joseph Jacks, who runs Open Source Software (OSS) Capital, an investment firm focused on open source, displays four types of open core business model (pictured above). The business models differ based on how much of the software is open source. Github, interestingly, employs the “thick” model of being mostly proprietary, with only 10% of its software truly open-sourced. Its funny that the site that hosts and facilitates the most open source development is proprietary. Jacks nails the most important question in the open core model: “How much stays open vs. How much stays closed?” The consequences can be dire to a business - open source too much and all of a sudden other companies can quickly recreate your tool. Many DevOps tools have experienced the perils of open source, with some companies losing control of the project it was supposed to facilitate. On the flip side, keeping more of the software closed source goes against the open-source ethos, which can be viewed as organizations selling out. The continuous delivery pipeline project Jenkins has struggled to satiate its growing user base, leading to the CEO of the Jenkins company, CloudBees, posting the blog post entitled, “Shifting Gears”: “But at the same time, the incremental, autonomous nature of our community made us demonstrably unable to solve certain kinds of problems. And after 10+ years, these unsolved problems are getting more pronounced, and they are taking a toll — segments of users correctly feel that the community doesn’t get them, because we have shown an inability to address some of their greatest difficulties in using Jenkins. And I know some of those problems, such as service instability, matter to all of us.” Striking this balance is incredibly tough, especially in a world of competing projects and finite development time and money in a commercial setting. Furthermore, large companies like AWS are taking open core tools like Elastic and MongoDB and recreating them in proprietary fashions (Elasticsearch Service and DocumentDB) prompting company CEO’s to appropriately lash out. Commercializing open source software is a never-ending battle against proprietary players and yourself.

  3. Compensation for Open Source. Eghabl characterizes two types of funders of open-source - institutions (companies, governments, universities) and individuals (usually developers who are direct users). Companies like to fund improved code quality, influence, and access to core projects. The largest groups of contributors to open source projects are mainly corporations like Microsoft, Google, Red Hat, IBM, and Intel. These corporations are big enough and profitable enough to hire individuals and allow them to strike a comfortable balance between time spent on commercial software and time spent on open source. This also functions as a marketing expense for the big corporations; big companies like having influencer developers on payroll to get the company’s name out into the ecosystem. Evan You, who authored Vue.js, a javascript framework described company backed open-source projects: “The thing about company-backed open-source projects is that in a lot of cases… they want to make it sort of an open standard for a certain industry, or sometimes they simply open-source it to serve as some sort of publicity improvement to help with recruiting… If this project no longer serves that purpose, then most companies will probably just cut it, or (in other terms) just give it to the community and let the community drive it.” In contrast to company-funded projects, developer-funded projects are often donation based. With the rise of online tools for encouraging payments like Stripe and Patreon, more and more funding is being directed to individual open source developers. Unfortunately though, it is still hard for many open source developers to pursue open source on individual contributions, especially if they work on multiple projects at the same time. Open source developer Sindre Sorhus explains: “It’s a lot harder to attract company sponsors when you maintain a lot of projects of varying sizes instead of just one large popular project like Babel, even if many of those projects are the backbone of the Node.js ecosystem.” Whether working in a company or as an individual developer, building and maintaining open source software takes significant time and effort and rarely leads to significant monetary compensation.

Dig Deeper

  • List of Commercial Open Source Software Businesses by OSS Capital

  • How to Build an Open Source Business by Peter Levine (General Partner at Andreessen Horowitz)

  • The Mind Behind Linux (a talk by Linus Torvalds)

  • What is open source - a blog post by Red Hat

  • Why Open Source is Hard by PHP Developer Jose Diaz Gonzalez

  • The Complicated Economy of Open Source

tags: Github, Gitlab, Google, Twitch, Instagram, E;, Elm, Javascript, Open Source, Git, Linus Torvalds, Linux, Microsoft, MapReduce, IBM, Fortran, Node, Vue, SourceForge, VA Software, Snowflake, Frank Slootman, ServiceNow, SaaS, AWS, DevOps, CloudBees, Jenkins, Intel, Red Hat, batch2
categories: Non-Fiction
 

About Contact Us | Recommend a Book Disclaimer