DuckDB: An Introduction – Analytics Vidhya

prologue

The duck was chosen as the mascot for this database management system (DBMS) because it is a very versatile animal that can fly, walk, and swim. In other words, it is designed as her DBMS for local analysis. Now let’s see what features it offers.

easy installation
Embedded: no server management
Processing and storing tabular datasets (such as CSV and Parquet files)
Single file save format
High-speed analytical processing
Fast transfer between R/Python and RDBMS
Interactive data analysis, e.g. joining and aggregating multiple large tables
Simultaneous large changes to multiple large tables such as adding rows, adding/deleting/updating columns

So let’s see what we can learn in this article.

what will you learn

In this article, you’ll learn how to install DuckDB, import, read and write CSV and Parquet files, meta queries, and the DBeaver SQL IDE. in short:

getting started
Import and export of CSV and parquet files
meta query
DBeaver SQL IDE

Get started with DuckDB

Enough talk. Now let’s get our hands dirty with the code 😉

Install DuckDB on your system.

Import the database and connect.

Ok, but I don’t always create the files manually. I need to read (or write) some files. The famous CSV, isn’t it?

In the next section you will learn how to work with CSV to read and write in a very convenient and fast format. Ornate parquet.

File import and export

Let’s start with the data scientist’s best friend: the CSV file.

Load the CSV file into a table using the read_csv_auto function.

con.execute("SELECT * FROM read_csv_auto('my_local_file.csv')").df()

Create a new table.

con.execute("CREATE TABLE tbl AS SELECT * FROM read_csv_auto('my_local_file.csv')")

Export data from a table to a CSV file.

con.execute("COPY tbl TO 'my_export_file.csv' (HEADER, DELIMITER ',')")

If you haven’t been introduced to him yet, it’s time to meet him. Parquet files are essential when working with large amounts of data. I won’t go into detail about its benefits in this article, but at this point you can see how simple and easy it is to work with this file format in DuckDB.

Read a Parquet file into a table using the read_parquet function.

con.execute("SELECT * FROM read_parquet('my_local_file.parquet')").df()

Create a new table.

con.execute("CREATE TABLE tbl5 AS SELECT * FROM read_parquet('my_local_file.parquet')")

Export data from a table to a Parquet file.

con.execute("COPY tbl TO 'my_export_file.parquet' (FORMAT PARQUET)")

OK! But what about huge tables? Let’s talk about it in the next section!

meta query

It’s very convenient to create some tables and then list them all. You can do this with SHOW TABLES.

con.execute("SHOW TABLES").df()

To see the schema of a table, use DESCRIBE followed by the table name.

df = pd.DataFrame({'Col1' : [100,90,30],'Col2' : [1,5,8]})

con.execute("CREATE TABLE tbl_df AS SELECT * FROM df")

con.execute(“DESCRIBE tbl_df”).df()

The most useful command in my opinion is SUMMARIZE. Returns the column name, column type, minimum value, maximum value, number of unique values, mean, standard deviation, quartile, number of values, and percentage of null values. A large amount of information with one command!

con.execute("SUMMARIZE tbl").df()

In practice, there is so much information that you may not need it or may find it difficult to see. For example, you can use the SUMMARIZE command in combination with SELECT to get a summary of just the desired columns.

con.execute("SUMMARIZE SELECT Col1 FROM tbl").df()

If you like SQL and Python, you should give DuckDB a try.

DBeaver SQL IDE

“DBeaver is a powerful and popular desktop SQL editor and integrated development environment (IDE). Available in both open source and enterprise editions. Visually explore the tables available in DuckDB and create complex queries. DuckDB’s JDBC connector Allows DBeaver to query DuckDB files and, by extension, any other files that DuckDB can access (like a parquet file)”

If you like DubkDB, you’ll love this IDE. It’s easy and lightweight. In this section we will install, connect and create a database in local memory. Create, read and save parquet files!

Let’s go! This tutorial will show you how to install the Windows version. download page.