Change in behavior with fsync()

Starting with PostgreSQL 1.2, 10.7, 9.6.12, 9.5.16, and 9.4.21 the fsync() behavior has changed. Here is what the PostgreSQL group had to say:

When available in an operating system and enabled in the configuration file (which it is by default), PostgreSQL uses the kernel function fsync() to help ensure that data is written to a disk. In some operating systems that provide fsync(), when the kernel is unable to write out the data, it returns a failure and flushes the data that was supposed to be written from its data buffers.

This flushing operation has an unfortunate side-effect for PostgreSQL: if PostgreSQL tries again to write the data to disk by again calling fsync(), fsync() will report back that it succeeded, but the data that PostgreSQL believed to be saved to the disk would not actually be written. This presents a possible data corruption scenario.

This update modifies how PostgreSQL handles a fsync() failure: PostgreSQL will no longer retry calling fsync() but instead will panic. In this case, PostgreSQL can then replay the data from the write-ahead log (WAL) to help ensure the data is written. While this may appear to be a suboptimal solution, there are presently few alternatives and, based on reports, the problem case occurs extremely rarely.

A new server parameter data_sync_retry has been added to manage this behavior. If you are certain that your kernel does not discard dirty data buffers in such scenarios, you can set data_sync_retry to on to restore the old behavior.

Microsoft recently acquired Citus Data. Citus is an innovative open-source extension to PostgreSQL that transforms PostgreSQL into a distributed database, dramatically increasing performance and scale for application developers. Citus has a few different products. Citus Community which is an open source extension to PostgreSQL. Citus Cloud is a fully-managed database as a service. Last but not least is Citus Enterprise which allows you to distribute your database across multiple nodes.

I personally have used the Citus extension cstore_fdw for a while now and it's a great extension for PostgreSQL. One of the databases I manage is a 2TB data warehouse type database for a telecommunications company. I pull in various records related to phone calls. After an ELT process, I store the data are tables that are partitioned daily. Today's data is stored in a normal ProgreSQL table, but all tables (partitions) for prior days are converted to a columnar store using the Citus extension cstore_fdw. It's fast and the data takes up considerably less space on disk compared to traditional PostgreSQL table storage. We are stroring around 450MM rows per month and able to scan this amount of data in 30-45 seconds.

Microsoft is really pushing into the non-Windows related products. They also appear to have a desire to really support the open source community. Microsoft started to support a managed database service for PostgreSQL on Azure in March 2018. It's pretty clear that Citus Cloud will move from AWS to Azure sometime in the future.

Welcome to TalkPostgreSQL.  Here you will find tips, tutorials, new, and other articles all related to the world's best open source database, PostgreSQL.  If you are interested in topics that I have not yet written about feel free to leave a comment and let me know what you would like to see.

PostgreSQL 10.5 and 11 Beta 3 Released! The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 10.5, 9.6.10, 9.5.14, 9.4.19, 9.3.24, with fixes for two security issues as well as bugs reported over the last three months. The third beta release of PostgreSQL 11 is also available for download.

Talk PostgreSQL is a site designed to help you understand PostgreSQL, get PostgreSQL news, other helpful information. 
© Copyright 2019 - TalkPostgreSQL - All Rights Reserved
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram