🚀 Think you’ve got what it takes for a career in Data? Find out in just one minute!

Amazon EMR: A cluster management tool managed by AWS

Amazon EMR: A cluster management tool managed by AWS

Amazon EMR (Elastic MapReduce) is a data processing service managed by Amazon Web Service (AWS). It enables the management of large amounts of data, in the petabyte range, using popular tools such as Apache Hadoop, Hive, Spark and HBase, to name but a few. Amazon EMR has been designed to offer great flexibility and scalability, […]

AWS Glue: What is it? What’s it for?

AWS Glue: What is it? What's it for?

AWS Glue is a fully managed, scalable data processing service that enables users to run serverless ETL (Extract, Transform, Load) workflows, freeing them from the need to manage the underlying infrastructure. A reminder about ETL processes ETL is a process designed to guarantee data quality and availability. It is divided into 3 phases: Extraction: recovery […]

SBT Scala: a tool to organize your Scala or Java projects

Explore the organizational prowess of SBT Scala – a versatile build tool for Scala and Java projects. Dive into the essentials of Structure Build Tool (SBT), discovering how it simplifies project management, dependencies, and builds. Elevate your development experience by harnessing the capabilities of SBT Scala for streamlined and efficient Scala or Java project organization.

In this article, we’re going to introduce you to a development tool for your Scala projects. SBT stands for “Simple Build Tool”. Let’s look at SBT Scala… It’s an open-source build tool that makes it easy to manage your projects. It lets you manage your dependencies, compile, execute and distribute your JAR files. Let’s take […]

Sequences and series: understanding these two mathematical concepts

Delve into the world of mathematical sequences and series. Gain a clear understanding of these fundamental concepts, exploring the definitions, properties, and practical applications. Whether unraveling the patterns in sequences or exploring the sum of series, this comprehensive guide provides insights into the essential building blocks of mathematical analysi

In this article, we take a look at two key mathematical concepts: sequences and series. You need some basic mathematical knowledge to understand them fully. Sequences are widely used in mathematics, and can be used to define sequences of mathematical objects such as polynomials, numbers, sets, functions, etc. Here, we’re only interested in numerical sequences, […]

Unsupervised learning: principle and use

Unsupervised learning: principle and use

As far as future prospects are concerned, many hopes are pinned on unsupervised learning to improve cyber security or the identification of various diseases. In terms of future prospects, unlike supervised learning, unsupervised learning is where the algorithm has to operate on unannotated examples. In this case, machine learning is entirely independent. The machine is […]

Style transfer with CycleGAN

Dive into the captivating world of style transfer using CycleGAN. Discover the principles and applications of Cycle Generative Adversarial Networks (CycleGANs) for artistic transformations, enabling the seamless transfer of styles between images. Unleash your creativity and explore how CycleGAN opens new possibilities in the realm of image synthesis and style adaptation.

Neural Style Transfer (NST) is a set of models and methods for transferring the visual style of one image to another, using images or videos. In this article, we’ll be taking a closer look at a particular model called CycleGAN. Today, the most successful NST algorithms are adapted Deep Learning algorithms using convolution layers. Artistically, […]

Streamlit, the tool for presenting your Machine Learning work

Streamlit, the tool for presenting your Machine Learning work

In Machine Learning, an important step in the processing of data is its graphical representation, so that it can be visualized and its behavior better understood. People in professions such as Data Scientist are regularly called upon to interpret and visualize data for other teams in their company, for example with Streamlit. In Machine Learning, […]

AttGAN: A tool for modifying facial attributes

Transform facial attributes effortlessly with AttGAN, a powerful tool for modifying and enhancing facial features. Explore the capabilities of Attribute Guided Generative Adversarial Networks (AttGAN) and learn how it revolutionizes the way facial attributes can be manipulated, providing innovative solutions in the realm of image synthesis and facial transformation.

Facial Attribute Editing refers to the set of methods used to modify one or more attributes of a given face. Before the advent of Deep Learning, this task was tedious, as it was carried out by hand, pixel by pixel. Recently, however, new algorithms have been developed to automate this task. Here, we take a […]

CatBoost: An essential Machine Learning tool

Elevate your machine learning endeavors with CatBoost, an essential tool for enhanced model performance. Uncover the strengths of this gradient boosting algorithm, designed for categorical feature support, faster training, and superior predictive accuracy. Explore how CatBoost can be a game-changer in your machine learning toolkit.

Since 2017, CatBoost has completed the panel of existing machine learning tools. Fast, efficient and accurate, CatBoost is one of the leading technologies in the field of gradient boosting. In this article, we explain everything you need to know about this technology: applications, benefits, how it works. What is CatBoost? CatBoost is an open source […]

“train_test_split: Tutorial on how to use this function

Master the art of data splitting with our comprehensive tutorial on train_test_split. Learn how to effectively use this function in Python, a crucial tool for creating training and testing datasets, optimizing machine learning model evaluation, and enhancing the robustness of your predictive models.

A Machine Learning model is capable of learning autonomously from one dataset, with the aim of predicting behavior on another dataset. To do this, it finds underlying relationships between independent explanatory variables and a target variable in the initial dataset. It then uses these patterns to predict or classify new data. How do I define […]

Brownian motion: principle and practical uses

Brownian motion: principle and practical uses

Let’s take a micrometer-sized particle immersed in a fluid. This particle will have a random motion due to the impact of other small particles on this “big” particle. This is the principle of Brownian motion, also known as the Wiener process. Historically, it was in 1827 that botanist Robert Brown discovered Brownian motion. He observed […]

Time series: Daniel, can you tell us about them?

Time series: Daniel, can you tell us about them?

New appointment with Daniel, the technical support for DataScientest.  the data science expert who accompanies learners throughout their training courses. Today, he talks to us about time series. Time series is one of the most widely studied topics in data science. In this article, you’ll discover the main components of a time series. What is […]