Scaling PostgreSQL Queries with Stado

Talk Type: 
45 Minute Talk
Track: 
Data warehousing
Technical Level: 
Intermediate
License: 
Creative Commons - Attribution Only

Stado, formerly know as GridSQL, provides a powerful and flexible analytical environment allowing users to process large amounts of data using a shared-nothing, massively parallel processing (MPP) architecture with PostgreSQL and PostGIS. Data is automatically partitioned across multiple nodes and each node processes its subset of data allowing queries to be distributed across the cluster and run in parallel. This fully open source architecture allows database performance to scale linearly as servers are added to the cluster while appearing as a single database to applications.

This presentation will demonstrate the 10-20x scalability and performance gains of PostgreSQL queries running in a Stado environment compared to a single PostgreSQL instance. Some intensive PostGIS queries will be the basis for the discussion. We will dig into how Stado plans a query capable of spanning multiple PostgreSQL servers and executes across those nodes using the Tiger data set as an example.