East 2008 Talks

A Flock of PLs

David Fetter
Talk Type: 45-Minute Talk

Experience Postgres's ever-growing group of internal programming languages!

Postgres is more than just an excellent database server. It's also a programming environment, and you can write programs that run inside in your favorite language, at least if your favorite language is C, Java, Javascript, LOLCODE, Perl, PHP, Python, R, Ruby, Scheme, Shell, SQL or Tcl. The list keeps growing. We will go over some examples in some of the programming languages showcasing Postgres's unique capabilities.

A Simple Database API for Queuing Background Tasks

Michael Bryzek
Talk Type: 45-Minute Talk

One strategy for scalability involves offloading as much processing as possible from live systems to background systems. This presentation will discuss a simple API to a complex record queuing system implement in the database.

We'll present production quality software used to support data denormalization, data warehousing, and full text search indexing, including an overview of when background processing applies and the types of problems it is particularly good at solving.

The talk will highlight the use of database constructs (triggers and plpgsql functions) to implement this lightweight queuing system.

Best Practices of PostgreSQL on Solaris

Jignesh Shah
Talk Type: 45-Minute Talk

If this is your first time of using PostgreSQL on Solaris, then the presentation helps you to start with a proven way of setting up PostgreSQL to avoid general pitfalls commonly observed with PostgreSQL on Solaris deployments. We will also discuss why it helps to go with this approach. We will also discuss monitoring options of identifying bottlenecks with PostgreSQL on Solaris using PostgreSQL and Solaris tools.

Big, Bad, Broken, PostgreSQL

Robert Treat
Talk Type: 45-Minute Talk

Early in 2008, we crashed our TB PostgreSQL instance. We'll go over the timeline of crash, discussing "the perfect storm" that got us there, how we got the server running again, and what was involved in getting back to a production state, including some really wierd things we saw along the way.

Build a Better PostgreSQL Driver: Ruby-pg

Jeff Davis
Talk Type: 45-Minute Talk

This talk is for application developers who need a richer interface, or community members who want to contribute PostgreSQL and their language of choice at the same time.

Using ruby-pg as an example, I'll discuss: interacting with the application developer community, programming style, how to offer the incredibly rich feature set of libpq in a matching style, making application developers want to use PostgreSQL, and getting the code distributed.

Cloning an elephant

Greg Sabino Mullane
Talk Type: 45-Minute Talk

Postgres by design has no "built-in" replication solution. Some consider this a good thing, as replication is means many things to many people. Learn just what replication can mean, whether you should consider using it, and all of the various free and commercial options available today.

Replication solutions discussed will include pg_dump, DRBD, log shipping/warm standby, hot standby, Skytools, Slony, pgpool-II PGCluster, Bucardo, Mammoth Replicator, uni/cluster, Daffodil Replicator, and

Community Round Table - Video

Joshua Drake, Magnus Hagander, Bruce Momjian
Talk Type: 90 Minute Round Table

An ad-hoc round table where attendees of the conference were able to ask any non-technical question. The round table was a success in understanding how the general community sees our development process and where the process needs to be improved.

Deploying PostgreSQL in a Windows Enterprise - Video

Magnus Hagander
Talk Type: 45-Minute Talk

This talk will outline some of the challenges to overcome when installing PostgreSQL in a Windows enterprise environment. Examples include mass-deployment, Active Directory integration for authentication and authorization, data exchange with existing systems. Both deploying the Windows version of the PostgreSQL server and integrating a Linux/Unix PostgreSQL in the existing infrastructures will be covered.

Great Steps in PostgreSQL History

Bruce Momjian
Talk Type: 45-Minute Talk

This talk explores the challenges faced by the PostgreSQL community over the years. The challenges often seemed insurmountable, but everything came out well in the end.

Inside the PostgreSQL Shared Buffer Cache

Greg Smith
Talk Type: 45-Minute Talk

Properly utilizing the PostgreSQL shared buffer cache is vital for optimizing database performance, but most recommendations don't go beyond just giving a rough guidelines for how big it should be. Starting with an introduction to how the cache is structured and operates, this talk goes deep into using the pg_buffercache tool to monitor what in your database your applications are using the cache for. It finishes off with suggestions for using this information for tuning how big the cache is and the related background writer parameters. This talk is mainly aimed at UNIX-like platforms.

Interprocess Coordination and Communication -- Postgres to the Rescue - Video

Damjan Pelemis
Talk Type: 45-Minute Talk

In addition to using postgres for data storage and retrieval needs, we used it to direct and coordinate a set of disparate workers. These workers represent collection of different programs that are parts of problem solution, operating cooperatively on the same large data set that is stored in the database. These programs are executed on different computer systems and rely on the DB to signal the direction and current state of progress. This approach allowed us to quickly adapt the system to changing requirements without the need to transform the programs and more importantly apply these changes with virtually no down time. Instead of the traditional approach where database plays a side-role in the blackboard architecture pattern, in our solution postgres is the key component. It provides a communication channel through notification events as well as a means to coordinate work. Each work component goes through the process of selecting an available worker, requesting its attention, instructing it to execute a part of work, and ultimately writing back the part of the solution, or the solution itself. This approach in combination with postgres as its engine allowed us to build up and scale our solution to a parallel system running in redundant modes with a self-correcting scheduler.

IPC techniques used will include communicating through data--leveraging Postgres atomicity features--as well through the LISTEN/NOTIFY mechanism.

The State of the Community - Video

Joshua D. Drake
Talk Type: 45-Minute Talk

Explores the current status of the community, growth over the last few years, and specific strategies to keep the momentum going. Other topics to be discussed is the rise of the Regional groups and how Education is key to community survival.

Large table joins: how big and how slow

Barry Fox
Talk Type: 45-Minute Talk

When dealing with joins, table size can become critical regarding speed. Postgresql provides a set of variables that can be modified (depending on system resources) to help the performance.

The goal of this talk is to provide some initial benchmarks given the default configuration over a set of joins (each using an increasing number of rows), and contrasting those results with a series of tuned parameters (using the same queries on the same hardware).

This will help provide some information as to what impact can be hoped for, as well as provide some information as to when a table becomes 'too big'.

Logic and Databases - Video

Jeff Davis
Talk Type: 45-Minute Talk

The relational model, on which all SQL DBMSs are based, is founded on both logic and sets. Relational operators like JOIN and UNION have a direct logical meaning, and you can use those operators to answer complex questions from your data unambiguously. Not only can you provide unambiguous answers, but you can translate SQL queries into the language of logic, which can be understood by people outside of IT.

Equally important is the utility of logic working backwards to help find meaning in disorganized (or perhaps undocumented) data sets. Nearly every inherited data set is less than perfectly documented, and often inconsistent. Changes to (or replacement of) the application require this data to be migrated. By iteratively making assertions and then testing them using the rules of logic and the convenience of sets, you can find exceptions and contradictions that help you refine the meaning of data, and possibly correct inconsistencies.

Monitoring PostgreSQL with ptop

Selena Deckelmann
Talk Type: 45-Minute Talk

Ever wanted a command-line tool for monitoring PostgreSQL? ptop is the answer! Derived from the UNIX utility top, ptop automates some of the most common troubleshooting queries. This talk will provide an introduction to PostgreSQL monitoring, supported features in ptop, and plans for future development.

Features include:

* List of currently running queries
* Viewing query plans
* Viewing locks
* View current user table and index statistics
* Deltas on statistics at configurable intervals

Development is active, and the tool is currently supported on *BSD and Linux, with a Mac OS X port on the way.

Portable Scale-Out Benchmarks for PostgreSQL

Robert Hodges
Talk Type: 90-Minute Mini-Tutorial

The last few PostgreSQL releases have made great strides in scaling up PostgreSQL performance on single hosts. However, you can also increase performance by scaling out horizontally across multiple copies of data on different hosts. This talk digs into scale-out designs and shows benchmark results using a set of portable scale-out benchmarks. Based on the data we can draw some conclusions about performance trade-offs of the different approaches.

The talk consists of the following parts:

* Introduction. Discussion of scale-out approaches for PostgreSQL including SLONY, pgpool, and middleware replication using Sequoia. Overview of key measures of scale-out performance: read and write scaling, query latency (for proxy approaches), large vs. small transactions, and effects of datatypes like text and BLOBs.

* Introduction to the Bristlecone performance test tools. Evaluator creates mixed loads that stress system resources like CPU. Benchmark runs focused benchmarks with systematic variation of test input parameters. Each tool includes "portable benchmarks" that can be run within a few minutes of downloading, plus you can write more.

* Off to the Races. Results of running the portable benchmarks against a few of our favorite scale-out configurations. For fun we'll throw in some MySQL results.

* Summary. Discussion of the trade-offs between different database scaling approaches and when scale-out makes sense versus scale-up.

PostgreSQL and Benchmarks

Jignesh Shah
Talk Type: 45-Minute Talk

Benchmarks are often used to help new customers to figure out the following * How a particular database stacks up compared to other databases available * Helps size up configurations required to do particular tasks * Helps identify "configurations" which can help deliver the metrics required by customers * High water mark benchmarks helps identify problems before actual customers hit into those issues

We will discuss what we found as strengths and limitations with current versions of PostgreSQL while using them with such benchmarks. These often reflect real life scenarios of what customers face while using PostgreSQL as their primary database platform.

PostgreSql and Java

Brent Friedman
Talk Type: 90-Minute Mini-Tutorial

Overview of Java/JDBC versions, and how they impact usage with PostgreSql. Code samples and interactive demonstration of basic connectivity (jdbc) and advanced connectivity (connection pool). Discussion of the ability to throttle or control speed of query execution for J2EE applications, via a 'gatekeeper' java bean, based on current system load and database performance. Walk-through tutorial on connection gotchas, configuration steps, how to verify connections, accessing stored functions (procedures), etc.

PostgreSQL Operations: looking under the hood on Solaris

Theo Schlossnagle
Talk Type: 90-Minute Mini-Tutorial

Running PostgreSQL on a production server is nothing new. However, PostgreSQL lacks many of the detailed performance analytics exposed by its commercial counterparts. In this talk we'll explore how to perform systemic analysis of performance issues using DTrace. We'll also tour some clever hacks to bend PostgreSQL to your need in large scale environments.

Postmodern PostgreSQL Application Development

David Sankel
Talk Type: 90-Minute Mini-Tutorial

This transformative tutorial debunks old ideas of database application programming and replaces them with a framework that includes some of the best Open Source tools available.

Goals Set. We'll start with an overview of the problems we wish to solve and the goals we wish to achieve. Among them are platform independence, completely open-source tools, good looking graphical interfaces, and simple deployment.

Choices Made. Several choices need to be made among the vast sea of open source tools. We'll explain how we came to decide on PostgreSQL for our database back-end, Python for our programming language, SQLAlchemy as our PostgreSQL library, GNU Make as our development environment, and PyQt as our GUI.

The Journey. The journey will be the creation of an example application using the outlined framework. Roughly it will follow these stages:

* Design of the Makefile.
* Creation of the database tables using PostgreSQL statements.
* Sample Data made using SQL.
* Design of the user interface.
* Implementation of the UI with SQLAlchemy.
* Creation of the installer makefile rules (this will include installation of PostgreSQL server from within another installer).
* Making of a pretty SVG Icon quickly.
* (if there's extra time) Using JSON as a file format for partial PostgreSQL database dumps.

The sample application built will be a simple customer management application, but the same techniques are applicable to other software including web applications.

Query Execution Techniques in PostgreSQL

Neil Conway
Talk Type: 90-Minute Mini-Tutorial

Specific topics will likely include:

* a brief summary of the lifecycle of a query in PostgreSQL and the role of the query optimizer
* the "iterator model", which is the interface used to communicate data and exchange control flow in the query executor
* scan evaluation
o sequential scans and index scans
o bitmap scans
o predicate evaluation
o materialization
o evaluating subqueries and set-returning functions
* join evaluation
o nested loops, hash joins, sort-merge joins
o outer joins
* aggregate evaluation
o grouping via sorting
o grouping via hashing
* how to understand the output of EXPLAIN

Rapid development as a denial of service threat

Andrew Sullivan
Talk Type: 45-Minute Talk

Users of PostgreSQL often choose it because of the serious way it treats the data it receives. At the same time, many PostgreSQL users are under tremendous pressure to deliver desired features to the users very quickly. Development periods are short, and testing periods are often close to non-existent.

This talk presents some lessons the author learned in an environment of too-short deadlines where software was to be presented to the world. Some of these remarks are PostgreSQL specific, and some are just tips for how to play nicely with those whose SQL skill you think needs some help.

SQL/XML For Developers

Lewis Cunningham
Talk Type: 90-Minute Mini-Tutorial

Whether we like it or not, XML has become a part of our development lives. There are very few non-trivial applications that don't require some kind of XML interface. Even though Postgres has supplied XML functionality for many years, as an add on, there are still many developers who don't know the XML basics. Full SQL/XML support is now natively supported and Postgres 8.3 also supports a native XML data type.

This presentation will explain, in developer terms, what XML really is, what XML really is NOT and how to use SQL/XML to generate XML. Topics discussed will be:

Quick Overview of XML
The XML data type
SQL/XML Functions
When XML makes sense
When XML does NOT make sense

Slony-I: Evolution of a Replication System

Christopher Browne
Talk Type: 45-Minute Talk

This lecture will discuss:

* Things Slony-I inherited from eRServer
* Things Slony-I improved on over eRServer
* Subsequent enhancements
* Aspects of the software lifecycle for this system
* Observations on aspects that may be worth improving on in future replication systems

Strategies and Compromises in PostgreSQL-powered Application Development
John Rogelstad and Gil Moskowitz

Talk Type: 45-Minute Talk

This talk will discuss the various development decisions made by xTuple (formerly OpenMFG) over the past six years with its PostgreSQL-centric applications, and look to the next-generation architecture currently in the planning stages. We will discuss:

* Putting business logic into the database via PL/PGSQL and constraints
* Where/how best to handle errors
* Designing updatable views for the API and data import
* To normalize or denormalize?
* Why build Yet Another Report Writer?
* Tools to check for data problems before upgrading
* GUI or Web: does PostgreSQL care?
* Scaling with PostgreSQL: where's the ceiling?
* Future-safe scripting that lives in the database
* A more extendable, modular future

The Magic of MVCC - Video

Greg Sabino Mullane
Talk Type: 90-Minute Mini-Tutorial

MVCC is one of the strengths of Postgres, but can be difficult to fully understand. We'll learn exactly what MVCC is and how Postgres implements it, and give you a clear understanding of snapshots, two-part commits, transaction ids, isolation levels, the problem with indexes, and what vacuum really does, among other things. This will be light on the math and computer science, and heavy on analogies and real-world examples.

The Need for Speed: Case Study of Table Partitioning

Vivek Khera
Talk Type: 45-Minute Talk

One of postgres' mechanisms for dealing withe extremely large database tables is to partition them into smaller sub-tables which appear logically to your application as a single table. The naÃ&hibar;ve approach without modifying your application at all may result in significant performance degradation.

The most commonly written about technique for table partitioning is to split data using a date field. This turned out to not be beneficial at all to our use case. By splitting our data based on an ID number field, we were able to significantly improve the performance of the database and thus our application.

In this presentation, we will present a case study of the steps we took to increase the speed of a critical portion of our application by a factor of two. We will cover:

0) Identifying the application bottleneck 1) The analysis of the application's use of the large table 2) Determining the best partition scheme for this usage 3) How to ignore consultants who want to do what is easy for them 4) Testing the partition performance
5) Planning the migration with minimal (under 5 minutes) application downtime 6) Deploying the application changes and migrating the data

Along the way, we will cover tips and tricks we learned about partitioning tables which are not obvious from reading the manual, and which make migration easier than one might expect.

The Zen of Postgres, or How to become a Happy Hacker

Andrew Dunstan
Talk Type: 45-Minute Talk

The process of contributing can be intimidating for people not used to it. The talk is intended to provide helpful advice for those who wish to become successful PostgreSQL hackers. The focus will be more on social and community aspects than technical tools, although technical apects (CVS, make, debuggers etc.) will also be mentioned.

Using PostgreSQL In Web 2.0 Applications

Nikolay Samokhvalov
Talk Type: 45-Minute Talk

The talk addresses the problems that arise in modern web applications (so-called Web 2.0) development. Today, most developers choose Open Source to build new Internet startups. Being the most powerful Open Source database system, PostgreSQL provides a wide range of excellent capabilities to meets web developers needs, which continue to grow due to various reasons. Among such reasons the following ones can be distinguished:

* growing number of competitors on the market;
* extreme level of business requirements inconstancy;
* increasing level of quality requirements;
* larger volumes of data;
* exponential style of major numbers growing (registered users, online users, pageviews, TPS).