Beating 1 billion rows per second
Billion rows in any database is nice portion of data, even more interesting it is in relation database, but as Hans-Jürgen Schönig from Cybertec explained, it is possible do it in 1s. It all began when Oracle published an article saying they can do it, Hans decided he can beat it with PostgreSQL as well.From the real beginning, it was clear that it must be done by parallelism. Traditional one-core execution could help in parallelism when running more queries at a time, while pl/proxy could handle parallelism using sharding on procedures level. Parallelism from scratch is simplu not something feasible these days.
Hans began with a simple scenario with 3 columns, but this can be scaled to real data using more memory and more processors (more machines), theoretically to any level, so we can take this simplifying as acceptable. So, what can one server offer?
With 9.6 parallelism, which is not complete yet, optimizer determines number of cores that will be processing a query, small tables are processed by one core only. Option parallel_workers can be set to overcome guessing and make number of workers higher.
Benchmark must be done in memory only, to not do disc benchmark instead. It turned out, that one server can scale linearly up to 16 cores, even when running on 32 core box, because that box is actually 32 thread box, hyper-threading eventually degrades it to 16 core, though.
So, at this point we have 40M rows. Next thing is partition tables. And we can do this to needed extend. In the end, 32 billion rows were processed in less than 3s, which means the goal was reached, only few patching was used on top 9.6, many of them will be already in 10.
PostgreSQL for OLAP alias how the parallelism works in PostgreSQL
Tomas continued with an overview around processes and connections. Traditionally no threads and no forks are done in PostgreSQL when processing a query. That is ideal for OLTP applications, where apps do many small chunk of work jobs. Sharing processes via connection pooling is one thing how to utilize one process more, but it is not interesting for making query processing more effective.OLAP applications use usually less connections, so some real multi-processing is more needed there. Gather process can run more processes in parallel, gather data from them and every process can do something on their own. Quite a lot of restrictions so far, like shared locks don't allow to run data modifying queries. There is also only one level of parallelism, but for selects it might cover pretty nice chunk of use cases.
In version 10 there will be some improvements in bitmap scan, hash joins or aggregations. Users will need to set flag whether the PL function is parallel safe though. Using TPC-H, standardized benchmark, many tests were faster at least twice, comparing to single process queries.
Effectiveness of parallelism (difference from theoretical limit) depends on core number, we saw it goes from 100% (1 core) to 80% for 32 cores.
Currently, refreshing materialized views is not parallelized, but it would make great sense. Same for parallel sort, although gather sort might be done in 10.
David Turoň from LinuxBox began with a bit of joke, explaining how miners in Ostrava region start to program instead of mining.. Anyway, he talked about partman_to_cstore, which is a plugin living in gtihub currently, so user needs to build it first. It might be good for: table with rotations (timestamp/timestamptz), senzor data, log of events; generally big tables that does not change. It's nice that in most cases partman_to_cstore might be used without changes in application. Cron may run function and store data to cstore.
Roman Fišer from nangu.tv continued with sharing their problems when delivering a lot of multimedia content to many users. From the tools they use, let's name Consul, Patroni, or HAProxy. As for the approaches they chose, they need to be tolerant for latences between nodes, etc. For data collection they use collectd, store logs to RRD files, Collection3/Cacti as a frontend, InfluxDB/Grafana for back-end, and they think about Prometheus. New PgSQL version integration takes them approx. 6 months.
Containers everywhere
My talk then focused on containers in distributions, of course giving all of the examples from PostgreSQL world. Describing mostly basic stuff I wanted to bring containers a bit closer to users that typically don't like them much. Mentioning OpenShift, Flatpack, or systemd-nspawn, except traditional Docker, people could start trying it on their own.After presentation there were quite a few questions:
- Network - could IP tables (firewall) inside the container?
- Data in volumes shared in COW mode, so they can share the common part on disc - could volume-mounted data be on zfs?
- Several questions about user namespaces, what they are good for and whether it works as people expect.
- Sharing libraries in memory was an interesting question we couldn't answer for 100%, but from further look-up, any memory sharing would be against containers' concept of isolating processes, so nobody will probably implement it.
Machine learning using PostgreSQL
Lightning talk by Hans-Jurgen about Machine Learning focused on detecting fraudulent logins on a machine. Support Vector Machine is used to separate good users from bad, it's basically about creating model that is used for decision. PL/Python function to load data and use history data for learning itself. In the end the decision is also done in SQL and that all of course in PostgreSQL..Pavel Stehule talked about \g, \gset and \gexec commands in the second and also last lightning talk. Those commands were added recently in psql, all help in faster work in psql console.
What does it mean to hack PostgreSQL?
Pavel Stehule closed the conference with a talk about experiences with implementing XMLTABLE for parsing XML, which was missing in PostgreSQL. It was possible to use the XML functions before, but with XMLTABLE it is much more obvious, which is the reason why SQL looks like it looks. It was interesting excursion to the world of PostgreSQL hacker.When somebody reads ANIS SQL, it does not need to be really nice reading, details are missing and existing implementations are slightly different from standard, so what to do? In PostgreSQL, which implements the XMLTABLE later than DB2 and Oracle, author needs to take the functionality that is already existing. Generally, when implementing some new functionality, we might either consider patching the code, or writing an extension. If the feature can be done using extension, it's much easier, it can live anywhere and people are happy and prosper. In case patching of PostgreSQL is needed, it takes time and developer will spend a lot of energy. Especially tricky it is when using some library like libxml2, which, same as PostgreSQL, uses own memory management, uses different types of data and a conversion needs to be done.
In case of XMLTABLE implementation, some code change needs to happen in parser and executor. Here it comes handy to know well how the execution pipeline works in PostgreSQL -- parser, rewriter, planner, optimizer, executor. Also, SQL needs to be translated to some other language, so changing behaviour is basically changing of translator. On the other hand, this SQL translation won't be the bottle neck. Patching of parser (bison) needs to be clean, no conflict warnings will be acceptable during the review. Implementation is either an SRF function or executor patch.
When writing patch, first iteration in community is to write something dirty, to show that it is possible and conversation about benefits can begin. Next phase is complete patch, with documentation and regressions tests. The last part is commitfest patch, that is used for cleaning the code, documentation, extending tests, ...
The whole patch writing is very hard, one will learn assertiveness and respect in the first place. Some features take even 10 years from first patch to commit released. hstore for example helped Ruby on Rails and that helped PostgreSQL to grow. Without use case it is not very probable that change will be accepted.
So that was the 10th p2d2, I'm already looking for the next year!