We have the answers to your questions! - Don't miss our next open house about the data universe!

From Spreadsheets to Data Warehouses: Revolutionizing Startup Data Management

- Reading Time: 5 minutes
vpn

On February 23, 2021, Alexandre Laloo, data product manager at sennder, presented a webinar to our community about using data in startups.

Webinar replay

Alexandre Laloo and sennder

A graduate of a top business school, Alexandre Laloo joined Everoad in 2017 when the company had just twenty employees. This startup, specialized in the digitalization of road haulage, merged with sennder, its German competitor, in 2020. The company, which kept the sennder brand name, now has over 700 employees and has raised over 260 million euros. Alexandre has witnessed the company’s growth and the evolution of its approach to data.

Initially an operations manager, his role was to analyze the company’s business through the data it collected. His current role as data product manager involves directing the analytics roadmap, i.e. ensuring that data is properly disseminated throughout the company.

Data processing in startups

1. A company's relationship with data depends on its market

The term “startup” is much used in the news, and encompasses very different types of company. It is therefore impossible to generalize about the use of data in these organizations, as it is processed to respond to issues specific to each sector. In some fields, such as gaming or travel tech, data is a priority right from the start, as the company needs it to understand its market and product. In other sectors, the need for Data Analysts or Machine Learning Engineers is less important at the outset. For example, sennder didn’t hire a Machine Learning Engineer in its first 3 years.

Each start-up’s market will thus determine its data needs, i.e. the proportion of the budget allocated to data, on the one hand to recruit experts (Product Managers, Data Analysts, Data Scientists or Data Engineers) and on the other hand to equip itself with powerful analysis and processing tools. Indeed, integrating an enterprise cloud represents a substantial investment.

The priority of data for a start-up will depend on several criteria:

  • The nature of the product: mobile application, website, physical product, etc.
  • Market: transport, agri-food, finance, etc.
    The volume of data available
  • The company’s data culture: even if data becomes indispensable at a certain level of growth, investment will also depend on this factor.

2. Statup financing cycles

In addition to the market in which the company operates, its maturity – which can be assessed using financing cycles – will also determine the way it approaches its data.

Alexandre offers us a table that takes into account the market on the one hand and the maturity of the company on the other, to assess the need for data and the scale of the investment devoted to data.

For example, for an e-commerce site, the need is “intermediate”, as its priority is to invest in digital marketing to get off the ground. Its data strategy will focus on analyzing marketing campaigns and their performance (ROI calculations, for example).

The evolution of tools: from spreadsheets to data warehouses

This section presents the evolution of the issues and tools used by Everoad and Sander from 2017 to the present day. At the start of its activity, the volume of data collected by Everoad was relatively low, and the company’s priority was not to analyze data but to develop its road haulage business.

Today, Machine Learning represents a major investment for sennder, in particular to optimize hauliers’ routes and offer intelligent pricing. This change in approach to data has been gradual, and this section describes the steps that have led the company to its current level of expertise.

 

💡Related articles:

Image Processing
Deep Learning – All you need to know
Mushroom Recognition
Tensor Flow – Google’s ML

- Step 0: Google Sheets files

In 2017, Everoad was in the early stages of its business and hadn’t yet made its A series. The company was using a simple bot on Slack to retrieve a csv file daily that gave information on the previous day’s activity, there was no visualization, no real-time collection. The collection of information provided a basic understanding of the transport operations carried out, which was sufficient at the growth stage.

However, this solution did not allow users to work autonomously, and the technical team had to be called in at the slightest need.

- Step 1: Activate the data distribution lever

The next objective is to address the issue of data distribution in reporting tools. This stage relies on Google sheets, which can be used to create scripts to automatically send data from one file to another.

At this stage, Everoad had not yet hired any data experts. Alexandre was an operations manager, whose role was to analyze the business and gain an overview of operations. Data was not yet a priority until a certain stage, at which point the spreadsheet system was not sufficient to handle the growing volume of data to be processed.

This system was functional and could meet the most basic reporting needs, but it was based on a single data source, and allowed data to be viewed from a single analysis angle.

- Step 2: Database query tool

The next step was to implement a tool to run queries on different databases and provide different angles of analysis. Everoad opted for Redash (since acquired by Databricks) to run these queries and send data to different google sheets.

This tool made it possible to extract value from multiple data sources, run more flexible queries and be more autonomous from the technical team, who were nonetheless indispensable to the processes in place. Data was automatically updated every 2/3h.

A Data Analyst joined the team and various reports were produced (commercial and operational). This operation lasted 8 months before becoming too fragile in the face of Everoad’s growth.

- Step 3: Data Warehouse

Stage 3 involves the integration of Airflow and Big Query to manage the layers of data: storage, processing and sharing of data with users. This step marked a turning point in data processing for Everoad, which recruited three Data Analysts and a Data Engineer, demonstrating the company’s interest in its data.

The data team could now perform modeling and data modeling on its databases. Reporting was automated and product analyses could be produced.

However, no analytical visualization platform had been set up, and business intelligence tools were fragile, with the company still using spreadsheets and Google data studio. The teams were not yet dealing with governance issues, i.e. the management of data access and distribution within the company. This stage lasted two years, until the merger with sennder.

- Step 4: Governance, Data mart and Data Library

After joining sennder, the company recruited 4 data engineers and its tool stack has evolved considerably. Data governance has been put in place with Looker, and the data flow arrives in real time and can be directly exploited. Databases are automatically updated every 30 minutes.

The data team covers 100% of the business: all departments get analytical reports on their activities with KPIs that concern them, so 600 users consume data within the company.

The company has a data mart (or data store) that defines who needs what data and how to distribute it. Training modules have also been set up to enable users to improve their skills.

Data processes to be followed have been defined, with “best practices” and a data library to document what is done with data, always with a view to serving 3 types of user: business user, product user, data team user.

Conclusion

Exploiting the data collected is now an essential part of any company’s business, and represents a considerable growth lever. However, the level of priority given to data processing depends on both the business sector and the maturity of the company.

So, depending on the startup, interest in data comes into play at an earlier or later stage in the company’s growth, as illustrated by the stages described by Alexandre for Everoad and sennder.

In this webinar, Alexandre Laloo showed us that it’s possible to take an interest in data, start integrating databases and provide the first basic analyses without a budget, using free or freemium tools.

The process that has led sennder to its current status has been gradual, and has always responded to the needs generated by its growth.

This is an encouraging message for startups whose data doesn’t fit into their budget, as a simple treatment of their data can enable them to optimize their products and services without requiring a lot of resources.

DataScientest would like to thank Alexandre Laloo for his time and the quality of his presentation during this webinar, which showed us that the use of data sciences now concerns all sectors of activity, from finance to agri-food to road haulage.

Are you a start-up looking to exploit your data? DataScientest does it for you, free of charge!

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter

DataNews

Get monthly insider insights from experts directly in your mailbox