How to combine the flexibility of SQL with the simplicity and efficiency of the Python language? That's the ambition of SQLAlchemy. Discover everything you need to know about SQLAlchemy, a tool based on the object-relational mapping (ORM) principle.
What is object-relational mapping?
SQLAlchemy is a tool based on the object-relational mapping (ORM) principle.
ORM is a computer technique that maps the schema of a relational database (commonly known as SQL databases) and the classes of an object-oriented programming language (in this case, Python).
It creates a virtual layer between us and the database, which we can then use to query the database directly from a Python script.
SQLAlchemy simplifies the connection between Python and SQL databases by automatically converting Python class calls into SQL statements. This makes it possible to query relational databases in a Pythonic way.
SQLAlchemy can be divided into several parts:
- The engine, which is used to connect to the database and communicate our queries:
create_engine(). Typically, you pass a string argument indicating the location of the database (local path or URL), the database driver/dialect, and connection arguments (which can include credentials). In the following example, we use SQLite with a local database:
engine = create_engine(“sqlite:///base1.db”, echo=False)
However, you could also use PostgreSQL:
engine = create_engine(“postgresql:///base1.db”, echo=False)
DBAPI stands for “Python Database API Specification.” SQLAlchemy, like other database connection packages, is built on this global interface and adds its specific services on top of it.
SQLAlchemy also allows you to manage the Data Modification Language (DML) part of SQL, which includes basic table modification operations known as CRUD operations (Create, Read, Update, Delete).
SQLAlchemy maps SQL INSERT, UPDATE, and DELETE statements to its own methods:
delete, thanks to the ORM mechanism. This allows you to modify tables in an object-oriented way, as shown in the following example.
We connect to the database via .connect(), then give SQL statements (INSERT to create a row, DELETE to delete it, SELECT for a simple read) followed by .execute() to send them to the database.
- Metadata and strict table schema are common features of relational databases.
- SQL alchemy provides predefined classes to facilitate Data Definition Language (DDL), i.e. the part of SQL concerned with the creation of relational database schemas (type, keys, etc.).
- For example, the Column class represents a field in the table, Integer specifies that the field is of type “INT”, and String that it is a “varchar”.
- The ForeignKey class specifies that the column is a foreign key. Note that all these classes inherit from a global class called MetaData.
- In this second example, we create a new Locations table containing locations in new regions, and perform CRUD operations on it.
- To this end, SQLAlchemy provides us with pre-mastered classes such as Column or Table.
- Note that Python functions such as .insert() , update, .where or delete are used instead of SQL.
- However, the syntax of elementary functions such as engine, .connect(),execute may put some people off, as you have to provide your own SQL code to interact with your database.
- A session allows us to perform the same operations (read, write and delete operations) using more easily manipulated Python objects, which are then mapped to the database.
- Sessions are therefore used to perform object relationship management (ORM), and also keep track of objects that have been modified during the session.
An example to illustrate this more practical interface:
We carry out the same operations as before, but we notice that the syntax is much lighter: sessionmaker is used to define our session, which allows us to use only Python methods to communicate instructions to the database via a Python class, in this case Pokemon, which uses the pokedex table via the __tablename__ parameter.
We can then retrieve information from the database via the query() method, filter the results with filter_by() and then delete the object from the session with the delete() method.
Finally, let’s illustrate the notion of upsert , at the end of this last example.
We want to add several pokémons to our database, but we don’t know whether it already contains some or not.
To avoid adding duplicates, we’ll use the merge() function, which will update the information on pokémons already in the database (update) or insert a new row in the table (insert).
This combination of update and insert is called upsert.
What are SQLAlchemy's uses?
ORM, and more generally SQLAlchemy, is suitable for the following situations:
- basic use of CRUD operations
- Writing better tests and data fixtures for data consistency,
- SQLAlchemy is DataBase agnostic. This means that our python code will be the same for all SQL environments and dialects such as SQlite, PostGReSQL or Oracle.
- This improves interoperability with the rest of your application, so you can change database systems without having to change your code.
- Instead of juggling between different SQL dialects, the open source SQL Alchemy toolkit allows us to streamline our workflow and process our data efficiently from within the Python language.
- Writing table schema constraints directly from our Python script.
- Representing relational databases in Python classes not only improves code quality and avoids duplication, making maintenance easier, but also enables objects to be stored in a memory cache.
- This speeds up data access by reducing the number of queries, and increases performance.
- If you don’t have a cloud-hosted database such as a Datawarehouse, you can use the open-source PostGreSQL database via SQLAlchemy.
It should be noted, however, that ORM is ill-suited to the processing of voluminous data, as well as to ETL processes. In the latter case, dbt would be more appropriate (packages, macros, tests).
Are you passionate about SQL, Python and the world of data, and want to discover essential tools like dbt, Spark or SQLAlchemy?
Join our Data Analyst, Data Scientist or Analytics Engineer training courses at DataScientest to perfect your knowledge of the subject via the Big Data sprint courses.ig