Amazon EMR: A cluster management tool managed by AWS
Amazon EMR (Elastic MapReduce) is a data processing service managed by Amazon Web Service (AWS). It enables the management of large amounts of data, in the petabyte range, using popular tools such as Apache Hadoop, Hive, Spark and HBase, to name but a few. Amazon EMR has been designed to offer great flexibility and scalability, […]
AWS Glue: What is it? What’s it for?
AWS Glue is a fully managed, scalable data processing service that enables users to run serverless ETL (Extract, Transform, Load) workflows, freeing them from the need to manage the underlying infrastructure. A reminder about ETL processes ETL is a process designed to guarantee data quality and availability. It is divided into 3 phases: Extraction: recovery […]
SBT Scala: a tool to organize your Scala or Java projects
In this article, we’re going to introduce you to a development tool for your Scala projects. SBT stands for “Simple Build Tool”. Let’s look at SBT Scala… It’s an open-source build tool that makes it easy to manage your projects. It lets you manage your dependencies, compile, execute and distribute your JAR files. Let’s take […]
Sequences and series: understanding these two mathematical concepts
In this article, we take a look at two key mathematical concepts: sequences and series. You need some basic mathematical knowledge to understand them fully. Sequences are widely used in mathematics, and can be used to define sequences of mathematical objects such as polynomials, numbers, sets, functions, etc. Here, we’re only interested in numerical sequences, […]
Unsupervised learning: principle and use
As far as future prospects are concerned, many hopes are pinned on unsupervised learning to improve cyber security or the identification of various diseases. In terms of future prospects, unlike supervised learning, unsupervised learning is where the algorithm has to operate on unannotated examples. In this case, machine learning is entirely independent. The machine is […]
Style transfer with CycleGAN
Neural Style Transfer (NST) is a set of models and methods for transferring the visual style of one image to another, using images or videos. In this article, we’ll be taking a closer look at a particular model called CycleGAN. Today, the most successful NST algorithms are adapted Deep Learning algorithms using convolution layers. Artistically, […]
Streamlit, the tool for presenting your Machine Learning work
In Machine Learning, an important step in the processing of data is its graphical representation, so that it can be visualized and its behavior better understood. People in professions such as Data Scientist are regularly called upon to interpret and visualize data for other teams in their company, for example with Streamlit. In Machine Learning, […]
AttGAN: A tool for modifying facial attributes
Facial Attribute Editing refers to the set of methods used to modify one or more attributes of a given face. Before the advent of Deep Learning, this task was tedious, as it was carried out by hand, pixel by pixel. Recently, however, new algorithms have been developed to automate this task. Here, we take a […]
CatBoost: An essential Machine Learning tool
Since 2017, CatBoost has completed the panel of existing machine learning tools. Fast, efficient and accurate, CatBoost is one of the leading technologies in the field of gradient boosting. In this article, we explain everything you need to know about this technology: applications, benefits, how it works. What is CatBoost? CatBoost is an open source […]
“train_test_split: Tutorial on how to use this function
A Machine Learning model is capable of learning autonomously from one dataset, with the aim of predicting behavior on another dataset. To do this, it finds underlying relationships between independent explanatory variables and a target variable in the initial dataset. It then uses these patterns to predict or classify new data. How do I define […]
Brownian motion: principle and practical uses
Let’s take a micrometer-sized particle immersed in a fluid. This particle will have a random motion due to the impact of other small particles on this “big” particle. This is the principle of Brownian motion, also known as the Wiener process. Historically, it was in 1827 that botanist Robert Brown discovered Brownian motion. He observed […]
Time series: Daniel, can you tell us about them?
New appointment with Daniel, the technical support for DataScientest. the data science expert who accompanies learners throughout their training courses. Today, he talks to us about time series. Time series is one of the most widely studied topics in data science. In this article, you’ll discover the main components of a time series. What is […]