🚀 Think you’ve got what it takes for a career in Data? Find out in just one minute!

Arithmetic and Data Science

-
5
 m de lecture
-

Arithmetic is a fundamental branch of mathematics, dealing with the basic properties of numbers and elementary operations. Often perceived as a mere calculation tool, it actually plays a fundamental role in Data Science and AI. Discover everything you need to know!

Tracing its roots back to ancient civilizations over 20,000 years ago, arithmetic is often considered the most elementary branch of mathematics.

Yet, it is also a fundamental pillar of many scientific and technical disciplines. This is particularly true for Data Science, where it plays a major and often underestimated role…

So, what is its utility for data science, and how do Data Scientists exploit it daily? That’s what you will discover in the following article!

What is Arithmetic?

The very first traces of arithmetic calculations were discovered on notched bones dating back to the Upper Paleolithic period.

Over the centuries, numeration systems and calculation methods developed in various cultures, from ancient Egypt to Mesopotamia, through ancient Greece and India.

The term “arithmetic” itself comes from the Greek “arithmos,” meaning “number”. The ancient Greeks, notably Pythagoras and his followers, significantly contributed to the advancement of this discipline.

It is based on four fundamental operations: addition, subtraction, multiplication, and division. These operations form the basis of all more advanced mathematical calculations.

Additionally, arithmetic also deals with the intrinsic properties of numbers. For instance, it distinguishes between even and odd numbers, prime numbers (divisible only by 1 and themselves), composite numbers (having more than two divisors), and rational and irrational numbers.

Another notion is that of the fractional and decimal representations of numbers, which allow expressing non-integer quantities and performing more precise calculations.

However, don’t think that arithmetic is limited to basic mathematics! Although often associated with it, it extends to more advanced areas like number theory.

This branch explores the deep properties of integers and their relationships, addressing complex problems such as Goldbach’s conjecture or Fermat’s Last Theorem.

On the other hand, modular arithmetic, or the arithmetic of congruences, is a system where numbers “wrap around” after reaching a certain value (the modulus). This branch is particularly important in cryptography and computer science.

Thus, far from being limited to simple calculations taught in elementary school, arithmetic constitutes a rich and complex field and forms the basis of many branches of mathematics and their practical applications… notably Data Science!

The Role of Arithmetic in Data Science

At every stage of the Data Science process, arithmetic proves indispensable. Right from the start, during the initial data processing, it is of paramount importance.

1. Data Cleaning and Processing

Data cleansing, or Data Cleaning, often involves basic operations. For example, one can identify and replace outliers using arithmetically calculated thresholds.

Data can also be normalized by subtracting the mean and dividing by the standard deviation or imputing missing values by calculating means or medians.

Similarly, descriptive statistics largely rely on arithmetic. Calculating the mean is the sum of the values divided by their number, calculating the median involves identifying the central value after sorting, and calculating the standard deviation is the square root of the average of the squared deviations from the mean.

2. Data Analysis

Subsequently, during data analysis, Key Performance Indicators (KPIs) in business intelligence also rely on arithmetic calculations.

An example is the conversion rate, which is the division of the number of conversions by the total number of visitors.

Percentage growth is measured by subtracting the old value from the new one, then dividing by the old one and multiplying by 100.

Other techniques use arithmetic to make data comparable. This is the case for Min-Max normalization (x-min) / (max – min) and Z-score standardization (x-mean) / standard deviation.

3. Modeling

The next step is modeling, and once again, many Machine Learning algorithms are actually based on arithmetic operations.

Linear regression is the calculation of coefficients using the least squares method, while k-means clustering involves the iterative calculation of centroids as the arithmetic mean of the assigned points.

For model performance evaluation, metrics also use arithmetic. Precision is measured using the formula “true positives / (true positives + false positives)”.

Recall relies on the formula “true positives / (true positives + false negatives),” and the F1-score is obtained via the arithmetic mean of precision and recall.

4. Data Visualization (DataViz)

As you may know, after data analysis, it is crucial to present the results in clear and intuitive visualizations so that non-technical stakeholders can understand them.

However, creating visualizations requires arithmetic calculations. Histograms involve calculating intervals and counting occurrences, while pie charts rely on calculating angles proportional to frequencies.

Arithmetic is also used for scaling axes to determine appropriate scales. Calculating minimum and maximum values determines the axis limits. It also involves determining regular intervals for the graduations.

An Invaluable Asset for Data Scientists

A solid understanding of arithmetic allows Data Scientists to develop new approaches based on its principles, intuitively grasp the workings of algorithms, and correctly interpret statistical results.

This is also very useful for debugging and result verification. Arithmetic allows manually verifying an algorithm’s calculations, identifying errors in the code by comparing expected and obtained results, and performing consistency tests on the data and results.

Moreover, it helps in performance optimization by simplifying complex calculations to improve code efficiency and choosing appropriate data structures based on arithmetic complexity.

It is also a way to implement efficient numerical approximations when exact calculations are too costly. For all these reasons, arithmetic is a valuable ally for the Data Scientist!

Expertise Required to Meet Challenges

The application of arithmetic in Data Science can be more complex than it seems. First, remember that computers use a binary representation of numbers, which can lead to precision issues.

For example, floating-point numbers cannot exactly represent all decimal values. This can lead to rounding errors.

Very large or very small numbers can also exceed representation limits, causing calculation errors. And subtracting two very close numbers can result in significant precision loss.

The solution is to use arbitrary precision calculation libraries or advanced numerical computation techniques.

Another problem: traditional arithmetic can be challenged by Big Data. Even the simplest operations can become time-consuming when repeated billions of times.

Storing intermediate results can also overload the available memory. Moreover, some operations are difficult to parallelize effectively.

The best way to circumvent this obstacle is by using sampling techniques, approximate algorithms, or distributed computing.

Some Machine Learning and AI algorithms also go beyond simple arithmetic. This is the case for deep neural networks that use operations on multidimensional tensors.

Many learning algorithms also use complex optimization techniques that do not rely solely on elementary arithmetic. The same goes for some Bayesian or probabilistic models requiring much more complex calculations.

This is why a thorough training in advanced mathematics and scientific computing can be indispensable for Data Scientists.

Conclusion: Arithmetic, an Essential Mathematical Foundation in Data Science

As you have seen in this article, arithmetic forms the basis of data processing and analysis operations and allows for the evaluation and interpretation of results.It is also at the heart of many Machine Learning and statistical algorithms, making its mastery indispensable for any Data Scientist.

To acquire all the skills and knowledge required for this profession, you can turn to DataScientest.

Our training programs will allow you to learn Python programming, DataViz, Machine Learning, and Deep Learning, Data Analysis or Data Engineering, and MLOps.

At the end of the program, you will have all the tools to work as a Data Science professional and will receive a highly recognized diploma and certification.

All our training programs are conducted online, in bootcamp, continuous, or alternating modes, and our institution is eligible for CPF funding. Discover DataScientest!

You now know everything about arithmetic. For more information on the same topic, discover our article on algorithms and our article entirely dedicated to Machine Learning!

Facebook
Twitter
LinkedIn

DataScientest News

Sign up for our Newsletter to receive our guides, tutorials, events, and the latest news directly in your inbox.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter

DataNews

Get monthly insider insights from experts directly in your mailbox