Fake news! This term has been in fashion for some years now. It's all about the reliability of information. And that's precisely why data sources are so important. While in the world of journalism, this refers to the origin of the information, in the world of data expertise, data sources correspond to storage locations that bring together large quantities of information. So what is a data source? What is it used for? How does it work? Find the answers here.
What is a data source?
A data source is the physical or digital location where in various forms. In short, it is where the data comes from.
The data source can be both the place where the data was originally created and the place where it was added. For example, as part of a digital transformation, many companies are digitising their data. The place where they are stored electronically then becomes the source of this data.
In the same spirit, data sources can be digital (for the most part) or paper-based.
Whatever the case, the idea is to enable users to access and exploit the data from this source.
And they can do this in a variety of ways, since the data source can take different forms, such as a database, a flat file, an inventory table, web scraping, streaming data, physical archives, etc.
With the development of Big Data and new technologies, these different formats are constantly evolving, making data sources ever more complex. The challenge for organisations is to simplify them as much as possible.
Several data sources depending on the context
As we saw earlier, data sources can take different forms. But this depends above all on the context.
Data sources and databases are often confused. Both refer to the place where information is stored. But the database is only one form of data source (admittedly the most widespread).
It is also possible to think of the data source as a data provider, the use of self-service data, such as Excel, Tableau or Power BI, a type of computer storage, accounting, an economic indicator, etc.
In the same spirit, data source and DSN (data source name) should not be confused. DSNs describe a connection to a data source.
In some cases, the DSN is the same as the corresponding database or file, but this is not automatic. It may also be an address or a label allowing the data to be accessed more easily from its source.
Whatever the format and context, the idea of the data source is to define where the data comes from and to describe the connections between the information.
What are data sources used for?
The aim of data sources is to enable users to access the information they need, and if necessary to move or modify it.
To achieve this, data experts need to bring all the information together in one place to make it easier to use and understand.
Above all, data sources must be designed with the user in mind, to make it easier to process the data. The information must be stored consistently, both in terms of location and format.
This is what makes it easier to connect the information together. And therefore to simplify access to data and its understanding by as many people as possible.
How do data sources work?
As we saw earlier, there are many different sources of external data. For companies, the challenge is to integrate the data with an internal source to facilitate data processing and analysis.
A wide variety of solutions can be used to achieve this. For example, data can be integrated at source using network protocols (such as FTP or HTTP), APIs (application programming interfaces) or other protocols such as NFS, SMB, SOAP, REST and WebDAV.
Whatever integration tools are used, data experts need to make the data source as comprehensible as possible to the user. To do this, they need to identify the connections between the data and smooth out any differences in format or structure.
Now that you know all about data sources, you want to become a data expert. We invite you to find out more about DataScientest’s Data Analyst course.