Saturday, February 08, 2020

Report from Prague PostgreSQL Developer Day 2020

Another year behind us and the next P2D2 conference of PostgreSQL users and developers is ahead of us. Well, behind now as well actually. Slides will hopefully be soon at https://p2d2.cz/rocnik-2020/prednasky, but let's see what I personally considered interesting.

What's new in PostgreSQL 12 (a 13)

The first talk is traditionally given by Tomas Vondra and its about news in the latest version of PostgreSQL, this time v12 and upcoming v13. From those that seem interesting to me are improvements GSSAPI support that now support new auth mechanisms, new progress monitors for vacuuming and creating that allow admins to watch progress of those resource demanding operations. Performance can be also improved by reindexing concurrently. With PostgreSQL v12, users can also turn on checksums when seeing troubles with disk (or turn them off in case of performance issues).
Plugable access API will allow to implement and use different engines similar to what MySQL offers. So, we can slowly start looking forward to a columnar storage, or other innovative solutions.
Generated columns might be useful when developers need to store some calculated values.
Experience of working with JSON is improved as well, developers can now use new functions to work with JSON paths. And many more enhancements are coming to v13 as well.

JavaScript inside PostgreSQL

Simone Sanfratello then talked why and how use JavaScript inside PostgreSQL server, using plv8. Some examples of reasonable use are sanitizing values and doing some more advanced string operations like URL encoding and decoding. It can be extra useful in connection with JSON, that is more and more common in PostgreSQL world. What was a bit surprising that comparing to plpython3, plv8 was faster in the testing scenarios and a bit closer to the native postgresql implementation.

Is JSONB a Silver Bullet?

Angus Dippenaar continued in JSON and JSOB topic, first compared what are the benefits of JSON and where the JSONB is better. Thanks to parsing and indexing, JSONB is better when we need to work with JSON data effectively and make PostgreSQL to understand them. On the other hand, JSON is more effective when we only need to store and load it as a string, because it is only validated to be a valid syntax.
The most interesting part of the talk was slightly more practical part, where Angus showed mostly performance data from work with Twitter data, stored in PostgreSQL as JSON or JSONB. Comparison of either spead of storing or searching, same as comparison of table sizes was very interesting.

PostGIS in practice

Michal Páleník started after lunch, but didn't let the audience sleep long, because his example on the map projection on a mandarine was really fun. Since he was talking about PostGIS, the necessary geography basics were handy, but most people probably appreciated the queries that showed how to solve several tasks with PostGIS -- for example which Prague pub is the closest or most detached, pretty useful. He connected that with the other side of the web-app, using a simple JavaScript it is possible to show geography data pretty simply and I also was not aware that instead of OpenStreetMap, there is also http://project-osrm.org, a project that offers open source routing machine.

Experiences gained by optimizing an SQL query and consequences on testing the application

Aleš Zelený, as he likes it, shared his real experiences from practice, to be concrete experiences from disasters. He explained that a very simple task, at least that was his understanding in the beginning, turned to be a real horror story. From invalid data, real world issues and weired behaviour of PostgreSQL connector driver for R, through not little data (dozens of GB). He obvoiusly learned a lot on this task, and it was very interesting to hear the experiences he shared. One from all -- write meaningful comments, not those that describe the code, but that describe the thinking of the author. And never trust data, just because sample looks ok. What also helped was decomposing the big query, analyze and optimize it part by part. The fact that the final SQL is big does actually not mean it is wrong, it is just not readable and understandable. It is also handy to know the latest features of the PostgreSQL, as the covering indexes improved the overall performance as well. Overall, very intersting talk.

Professional PostgreSQL scheduling made easy

Pavlo Golub then talked about their solution aka yet another scheduler in PostgreSQL. This one has golang backend and looked pretty interesting.

Monitoring PostgreSQL

Roman Fišer talked about monitoring generally, what makes sense to monitor, so basically a handy 101 for PostgreSQL admins beginners. Then he showed Prometheus which seems to be one of the leading implementations in this field.

Profiling PlpgSQL tools

Pavel Stehule also shared some experiences from the field, that he met on the customers' side. For example, customers sometimes do not use the tools that already exist. Some of the tools are available by default, no need to install anything extra, so it's good to know about them. For example EXPLAIN ANALYZE statement, or auto_explain config. These are later used by external tools for profiling that can present pretty simple results readable even for less skilled admin. And for the end, Pavel gave some extra tips why queries might be too slow, like that PostgreSQL doesn't have stats for functions like MSSQL or Oracle have, or that it doesn't have plan cache. For complex queries, we can tell PostgreSQL to spend more time on optimizing, when it makes sense. Or a function that returns how much indexes are used, which might either be a sign for wrong query or really unneeded index that just slows down data manipulation queries.

So, again as every year, a lot of food for thought. Really great one-day conference.

Friday, April 07, 2017

Fedora Loves Containers and You Can Now Build Them Officially

Docker containers have been around for some time already, Red Hat not only ships dozens of them in the Container Catalog, but popularity of Open Shift proves that the containers are the future, not only for the DevOps use cases. Recently, the containers started to finally be main topic in Fedora distribution as well, which is proofed by new Container Guidelines that are still being developed.

So what to do to create own container and make it available for other community users in Fedora? First, you need to know how to prepare the Dockerfile and supporting files. For that, I recommend to follow the Best Practices and of course the Fedora Container Guidelines mentioned above. The container should do something sane on start, it should not be just RPMs installed in the container. For example daemons like databases should run the daemon, they should also initiate the datadir etc.

On the other hand, we can expect different behaviour from containers like Python or NodeJS, where those will probably be used only as a base for other container images. Especially for those kind of containers I really recommend to use the standardized tool called source-to-image, that takes a builder image and an application source as input and combines them to a new image as output. Although the source-to-image concept was originally designed for containers like Python or NodeJS (basically runtime of various languages), it may be used for daemon images as well -- in case of HTTPD server the application is composed from static files, in case of databases it can be configuration or initial data. In all those cases the end user will probably need to do some own changes, that require creating an adjusted container (ideally a thin layer on top of what distribution provides).

Ok, let's say we have a Dockerfile prepared, what next when we want to build a container in Fedora? Then we can continue similarly as with RPMs -- we need to create a Container Review bug in Bugzilla, similar as some already exist.

After another packager provides the flag, which says the container follows the guidelines, you can build the container in the koji. Yes, the build tool koji supports docker containers building now, and the tool to make this happen is the fedpkg tool, the same tool that is used to build RPMs as well. We only need to use container-build command, instead of just build. After that, container will be available in the Fedora registry under registry.fedoraproject.org and image can be obviously pulled using the docker client, as usually.

Some more insight about which containers are already available can give listing all the components under docker namespace in the Fedora dist-git.

That's all for the quick update about container in Fedora. So, what container are you working on already?

Tuesday, March 21, 2017

Software Collections Daemons Made System-wide

This post is about an idea that has been here for some time already, but I'd like to get some more feedback especially from Software Collections users. So, if you're familiar with this term, read on.

 

Familiar with Software Collections concept?

Shortly, the Software Collections (SCL) concept is a really cool technology, that is especially usable on RPM-based Linux distributions that deliver one version of a component and support it for long time, like Red Hat Enterprise Linux or CentOS. SCL allow you to install a newer (or just different) version of any component without affecting the rest of the system (which is crucial).

That all is done by installing files into separate directories that don't collide with the base system. For Software Collection called rh-mariadb101 (that delivers MariaDB 10.1 Server), the files are located at:
  • /opt/rh/rh-mariadb101/root/usr/bin (binaries)
  • /etc/opt/rh/rh-mariadb101/my.cnf.d (config files)
  • /var/opt/rh/rh-mariadb101/lib/mysql (data)
  • /var/opt/rh/rh-mariadb101/log/mariadb (error log file)

Why this is not enough?

This all has worked very well for several years and people love SCL. But even the beloved ones have some issues. And what we hear from users, the issues with Software Collections concept currently are basically those:
  • we need to use scl enable, which changes $PATH and other environment variables, so the binaries placed in different location are visible by the system
  • scripts originally written for "normal" MySQL use full paths (like /usr/bin/mysql) which does not work when we only have the Software Collection installed
  • Data directory, config files and log files are on different location than it is common
So, we tried to look at the issues above and tried to come up with a solution. An easy take would be not using Software Collections concept, but that would effectively mean to break something on the base system (either we would update some dependency, we would force users remove client library because there are some shared files with the server, and probably many more that I'm even not aware about). That means using Software Collections is still valid, we just need to remove the obstacles put by the alternative path used.

 

What to do with that?

The approach I prefer is based on two changes:
  • have a symlink on usual places in /etc/ and /var to make it easier for users to find the stuff they're looking for
  • have a wrapper for every binary, the wrapper is located in /usr/bin, so no scl enable is needed; the wrapper runs scl enable internally and then executes the binary located under /opt/rh/rh-mariadb101/root/usr/bin

The wrapper is quite easy and can be shared (symlinks then head to the wrapper); the wrapper can be then located in /opt/rh/rh-mariadb101/root/usr/libexec/scl_wrapper:
#!/usr/bin/bash
source scl_source enable @SCL@
exec `basename "$0"` "$@"
For example mysql tool is then available by having such a link:

/usr/bin/mysql -> /opt/rh/rh-mariadb101/root/usr/libexec/scl_wrapper

Both, the wrappers and the symlinks, are available in a separate sub-package, so just having them available does not change anything for currently working solutions, which I really like about this approach honestly.

It means only users who would like to use the Software Collection packages as normal components will be allowed to install a rh-mariadb101-mariadb-system-widesub-packge (cool name, huh?). Those users will need to accept the fact that they cannot install mariadb and mariadb-server packages from the base system any more, because those will obviously conflict with the rh-mariadb101-mariadb-system-wide package.

 

Does it really work?

A working solution is also available in my testing Copr repository:

https://copr.fedorainfracloud.org/coprs/hhorak/mariadb-wrapper/

For dependencies, you need either Red Hat Software Collection channel enabled on RHEL-7, or use the packages from the SCLo SIG group on CentOS 7. See more info at:
https://www.softwarecollections.org/en/scls/rhscl/rh-mariadb101/

That Copr will show you, that after installing the package rh-mariadb101-mariadb-system-wide, you can start doing things like the MariaDB 10.1 Software Collection would be normal package:

#> service mariadb start
#> service mariadb status
#> echo "select @@version;" | mysql
#> mysql_upgrade --force
#> less /etc/rh-mariadb101-my.cnf
#> less /etc/rh-mariadb101-my.cnf.d/mariadb-server.cnf
#> ls /var/lib/rh-mariadb101/

Feedback required!

Now, the most important thing -- the reason I shared this post here.. I want to hear feedback from current Software Collections users, whether this idea helps to overcome the Software Collection design issues and whether there are some more ideas.

For feedback, use ideally our mailing list sclorg@redhat.com.

Thursday, February 16, 2017

Notes from Prague PostgreSQL Developers Day 2017

Thursday in the middle of February means that there was another year of p2d2 conference. This year, it's already 10th p2d2 meet-up of PostgreSQL users. Although it might seem interesting, it's nothing special, 16th will be better opportunity to celebrate.

Beating 1 billion rows per second

Billion rows in any database is nice portion of data, even more interesting it is in relation database, but as Hans-Jürgen Schönig from Cybertec explained, it is possible do it in 1s. It all began when Oracle published an article saying they can do it, Hans decided he can beat it with PostgreSQL as well.

From the real beginning, it was clear that it must be done by parallelism. Traditional one-core execution could help in parallelism when running more queries at a time, while pl/proxy could handle parallelism using sharding on procedures level. Parallelism from scratch is simplu not something feasible these days.

Hans began with a simple scenario with 3 columns, but this can be scaled to real data using more memory and more processors (more machines), theoretically to any level, so we can take this simplifying as acceptable. So, what can one server offer?

With 9.6 parallelism, which is not complete yet, optimizer determines number of cores that will be processing a query, small tables are processed by one core only. Option parallel_workers can be set to overcome guessing and make number of workers higher.

Benchmark must be done in memory only, to not do disc benchmark instead. It turned out, that one server can scale linearly up to 16 cores, even when running on 32 core box, because that box is actually 32 thread box, hyper-threading eventually degrades it to 16 core, though.

So, at this point we have 40M rows. Next thing is partition tables. And we can do this to needed extend. In the end, 32 billion rows were processed in less than 3s, which means the goal was reached, only few patching was used on top 9.6, many of them will be already in 10.

PostgreSQL for OLAP alias how the parallelism works in PostgreSQL

Tomas continued with an overview around processes and connections. Traditionally no threads and no forks are done in PostgreSQL when processing a query. That is ideal for OLTP applications, where apps do many small chunk of work jobs. Sharing processes via connection pooling is one thing how to utilize one process more, but it is not interesting for making query processing more effective.

OLAP applications use usually less connections, so some real multi-processing is more needed there. Gather process can run more processes in parallel, gather data from them and every process can do something on their own. Quite a lot of restrictions so far, like shared locks don't allow to run data modifying queries. There is also only one level of parallelism, but for selects it might cover pretty nice chunk of use cases.

In version 10 there will be some improvements in bitmap scan, hash joins or aggregations. Users will need to set flag whether the PL function is parallel safe though. Using TPC-H, standardized benchmark, many tests were faster at least twice, comparing to single process queries.

Effectiveness of parallelism (difference from theoretical limit) depends on core number, we saw it goes from 100% (1 core) to 80% for 32 cores.

Currently, refreshing materialized views is not parallelized, but it would make great sense. Same for parallel sort, although gather sort might be done in 10.

David Turoň from LinuxBox began with a bit of joke, explaining how miners in Ostrava region start to program instead of mining.. Anyway, he talked about partman_to_cstore, which is a plugin living in gtihub currently, so user needs to build it first. It might be good for: table with rotations (timestamp/timestamptz), senzor data, log of events; generally big tables that does not change. It's nice that in most cases partman_to_cstore might be used without changes in application. Cron may run function and store data to cstore.

Roman Fišer from nangu.tv continued with sharing their problems when delivering a lot of multimedia content to many users. From the tools they use, let's name Consul, Patroni, or HAProxy. As for the approaches they chose, they need to be tolerant for latences between nodes, etc. For data collection they use collectd, store logs to RRD files, Collection3/Cacti as a frontend, InfluxDB/Grafana for back-end, and they think about Prometheus. New PgSQL version integration takes them approx. 6 months.

Containers everywhere

My talk then focused on containers in distributions, of course giving all of the examples from PostgreSQL world. Describing mostly basic stuff I wanted to bring containers a bit closer to users that typically don't like them much. Mentioning OpenShift, Flatpack, or systemd-nspawn, except traditional Docker, people could start trying it on their own.

After presentation there were quite a few questions:
  • Network - could IP tables (firewall) inside the container?
  • Data in volumes shared in COW mode, so they can share the common part on disc - could volume-mounted data be on zfs?
  • Several questions about user namespaces, what they are good for and whether it works as people expect.
  • Sharing libraries in memory was an interesting question we couldn't answer for 100%, but from further look-up, any memory sharing would be against containers' concept of isolating processes, so nobody will probably implement it.

Machine learning using PostgreSQL

Lightning talk by Hans-Jurgen about Machine Learning focused on detecting fraudulent logins on a machine. Support Vector Machine is used to separate good users from bad, it's basically about creating model that is used for decision. PL/Python function to load data and use history data for learning itself. In the end the decision is also done in SQL and that all of course in PostgreSQL..

Pavel Stehule talked about \g, \gset and \gexec commands in the second and also last lightning talk. Those commands were added recently in psql, all help in faster work in psql console.

What does it mean to hack PostgreSQL?

Pavel Stehule closed the conference with a talk about experiences with implementing XMLTABLE for parsing XML, which was missing in PostgreSQL. It was possible to use the XML functions before, but with XMLTABLE it is much more obvious, which is the reason why SQL looks like it looks. It was interesting excursion to the world of PostgreSQL hacker.

When somebody reads ANIS SQL, it does not need to be really nice reading, details are missing and existing implementations are slightly different from standard, so what to do? In PostgreSQL, which implements the XMLTABLE later than DB2 and Oracle, author needs to take the functionality that is already existing. Generally, when implementing some new functionality, we might either consider patching the code, or writing an extension. If the feature can be done using extension, it's much easier, it can live anywhere and people are happy and prosper. In case patching of PostgreSQL is needed, it takes time and developer will spend a lot of energy. Especially tricky it is when using some library like libxml2, which, same as PostgreSQL, uses own memory management, uses different types of data and a conversion needs to be done.

In case of XMLTABLE implementation, some code change needs to happen in parser and executor. Here it comes handy to know well how the execution pipeline works in PostgreSQL -- parser, rewriter, planner, optimizer, executor. Also, SQL needs to be translated to some other language, so changing behaviour is basically changing of translator. On the other hand, this SQL translation won't be the bottle neck. Patching of parser (bison) needs to be clean, no conflict warnings will be acceptable during the review. Implementation is either an SRF function or executor patch.

When writing patch, first iteration in community is to write something dirty, to show that it is possible and conversation about benefits can begin. Next phase is complete patch, with documentation and regressions tests. The last part is commitfest patch, that is used for cleaning the code, documentation, extending tests, ...

The whole patch writing is very hard, one will learn assertiveness and respect in the first place. Some features take even 10 years from first patch to commit released. hstore for example helped Ruby on Rails and that helped PostgreSQL to grow. Without use case it is not very probable that change will be accepted.

So that was the 10th p2d2, I'm already looking for the next year!

Tuesday, February 07, 2017

FOSDEM 2017 - personal notes from Sunday

First day was exhausting, so as it happens, during second day of the Fosdem, I tend to write less notes, so this day will be a bit shorter.

Arduino in Go and Go in Arduino

We started the day 2 in Golang room where Ron showed how to code Arduino in golang. They provide a framework that can be used for not only simple, but also a bit more complicated apps. He showed few practical demos from blinking led, through button-controlled lights, to Linux (busybox) based dron simulating solar system repair. I must admit it was really cool to see a linux controlled dron flying above our heads, doing a flip and that all communicating with a base via wi-fi.
More info: https://fosdem.org/2017/schedule/event/go_gobot/

Hacking cars to make them drive alone

A similar talk, this time not in golang room any more, was Open Source Car control introduction by Josh Hartung -- they buy a cheap Kia car and put an Arduino, inside. They rather spoof wheel assistant rather than creating new complicated controlling.. so they basically do a slight man-in-the middle attack. The reason is that when anybody needs to do something with a car, he needs to hack it, and the only standard is the OBDII, which does not offer what is needed.
Safety can be done by different ways, the safest so far seems to have strict policy, that touching a driving wheel or break hard resets the system, which puts a car to the stock settings.

For assuring good enough product quality, they use coding standards, SW diagnostics, code review process and unit integration testing.
Autonomous car racing looks like seniors driving (approx speed just few km/h), but it looked interesting, given those are not cars produced with milions given into research. And waht is the best, anybody can join and hack own car, the cheaper the better.
More info: https://fosdem.org/2017/schedule/event/open_source_car_control/

Rest of the day in Python devroom

Brian Bouterse explained how CPython (the runtime) can be debugged using gdb to find some issues in the runtime itself. However, GDB can also work with the python code and corefile in a bit smarter way, so users only see python stack functions for example. And why should we use debugger for python? For example if you have some problem that is hard to debug or you cannot share reproducer to the upstream. We can make the Python better by fixing such corner cases. It's a powerful technique, that every Python developer should know of.
More info: https://fosdem.org/2017/schedule/event/python_gdb/

Threads are goto of our generation, mentioned Ewoud Van Craeynest in his talk Asynchronous programming with Coroutines in Python. It does not mean we should not use them, we should just use it less. Or use some framework that will force us to do it more correct, like asyncio.
asyncio allows to call some functions asynchronously by using decorators, can create event loop, can call functions with delay or at some point in time.
No callbacks, easy to use. Usable for SSH, file handling, IRC, HTTPD, RPC, testing, and other use cases.
In the end Ewoud mentioned, that some reimplementations of parts of asyncio also exist, because not everybody liked the way upstream implements the async framework.
More info: https://fosdem.org/2017/schedule/event/python_coroutines/

Ben Nuttall was talking about IoT in Python. GPIO Zero helps implementing Raspberry apps using pre-prepared objects where user only specifies pin number and calls methods to work with the sensor.
Quite complicated and large device hierarchy demonstrated wide range of objects available for users.
Remote GPIO (pigpio) allows to run the same code and in the end control remote device.
Picamera is a able to capture 1080p30, 720p60 VGA90 video, so you can create your mini camera using few lines of code.
Energeie allows to remote control real 230V plugs, an example of remote controlled tortoise lamp showed one of the possible usages. I guess there are more useful use cases though.
Sense HAT is then a complex module that returns many values like temperature, humadity, etc.
More info: https://fosdem.org/2017/schedule/event/python_raspberry_pi/

The last talk on this year's Fosdem was prompt_toolkit by Jonathan Slenders. This tool brings nice and very advanced prompt for cmd-line python apps.
Features included today: autocompletion (using table), syntax highlighting, colors, multi-line edits and many more.
Tools that use prompt_toolkit library already proof its readiness: wharfee (Docker client), pgcli, mycli, ipython, screen splitting, etc.
Since the library includes a lot of around text-editing, pyvim was then quite easy to write, at least to show up that it is not that painful to implement a real editor in pure Python. Jonathan plans to continue in re-factoring the library, so the API can change a bit in the future.
More info: https://fosdem.org/2017/schedule/event/python_prompt_toolkit/

I'll be back!

So that was it, it was exhausting but still great this year, same as the Belgium beer. And we tried Uber the first time, since it was cheeper than traveling by local transport (in 4 persons). It worked perfectly!

FOSDEM 2017 - personal notes from Saturday

As always, do not expect the following text to be a nice reading, it's more a sack of notes for further reference. And yeah, there was also the CentOS Dojo one day before and second date of Fosdem the day after.

Toooo big FOSDEM

First weekend in February means that several thousands of free and open source fans get to Belgium capital, Brussels. First visitor's idea about Fosdem is that it is year to year more and more crowdy. Talks in smaller rooms are almost always full and one needs to sneak inside in advance to get to some interesting talk. On the other hand, number of interesting talks is not decreasing, so it is still worth traveling there, especially when you meet friends from last year or other conferences. For the new comers, at least there was 15 degrees more than in Brno that time. All about the event is available at https://fosdem.org/2017.

Optimizing MySQL, MariaDB already in compiler

Optimizing SQL without touching my.cnf talk by Maxim Bublis, a Dropbox engineer opened MySQL devroom. He spoke about billions of files stored daily, PB of metadata, and EB of raw data.
They use sysbench to test performance and Bazel.build for building own flavor of packages, because profile-guided optimization (PGO) requires rebuild.
It's not a good idea to use benchmarking for unit tests, since those test corner cases.

Maxim mentioned concrete options they use in GCC for PGO: --fprofile-generate --fprofile-correction
Clang with PGO is a bit more successful, it was interesting that GCC 4.9 was worse than 4.6, 5.4 even much more worse, but on the other hand very good with PGO builds.
Link-time optimization, done by linker instead of compiler, is not supported in MySQL so far. However, they start coding on it.
Totally, they achieved 20% improvement, and have many further ideas to future.
More info: https://fosdem.org/2017/schedule/event/opti_mysql/

Sysbench talk by Alexey Kopytov started from history, that it originally was a simple tool, then became more complicated to be usable in general use cases, so Lua was chosen as scripting language. But then a real surprise came -- Alexey was happy to announce the first 1.0 release after 10 years of development.
Option --mysql-dry-run can measure tps in MySQL without need to know server structure. Thanks to LuaJIT and xoroshiro128+ the tool is several times faster. Performance also benefits from having no mutexes and no shared counters -- ConcurencyKit is used to avoid those, so stats are counted per thread.

Sysbench can be used in shebangs as well. Supporting all cmd-line options as global vars was troublesome, version 1.0 can validate them better.
Arbitrary C functions can be called from lua scripts, so no binary execution is needed. It's possible to use more connections in one thread. In the new version we can also see what previously was hidden to user - latency histograms. Users also may find useful to ignore some particular errors, especially in distributed environment. After long considerations Windows support was dropped. Live long and prosper, sysbench!
More info: https://fosdem.org/2017/schedule/event/sysbench/

When one instance is not enough

Spark for multi-procesing explained by Sveta Smirnova from Oracle gave an overview about how Spark differs from MySQL, why parallelism makes full table scan even better than indexing, which is missing in Spark.
More info: https://fosdem.org/2017/schedule/event/mysql_spark/

Group replication is a plugin, which was added to 5.7 version of MySQL pretty recently. It allows to automate failover in single primary mode, provides fault tolerance, multi-master updates on any node, enables group reconfiguration, etc. So, it's a pretty big deal added in the minor version, that for sure deserves some closer look.

The principle is simple, as Alfranio explained -- the plugin checks conflict, on conflict the transaction rolls back, else it gets committed.
Then he explained the concurrency problem solution, how more entities can agree on some fact, using paxos, which is used in group replication.
More info: https://fosdem.org/2017/schedule/event/mysql_gr_journey/

Which proxy for MariaDB?

Colin Charles went through existing proxy solutions we have today for MySQL and clones.
MySQL Proxy is middle part between client and server, with Lua interpretter to rewrite queries, add statements, filter results, etc. It is not much active now though.

MaxScale is similar, fully plug-able architecture operating on level 7, allowing logging, or working with other back-ends. Schema-based sharding, binlog server, or query re-writing also supported in MaxScale. Very usable for Galera cluster and Kafka backend also exists. MaxScale has first forks by booking and airbnb (connection pooling), because they use Ruby and it is then necessary for them to pool connections. Colin also mentioned the license change, when MaxScale 2.0 was released under BSL.

MySQL Router - fully GPLv2, allows transparent routing, plugin interface via MySQL Harness, can do Failover, load ballancing (at this point Colin suggested to see https://www.youtube.com/watch?v=JWy7ZLXxtZ4). MySQL Router does not work with MariaDB server though.

ProxySQL - stable, brings HA to DB topology, connection pooling and multiplexing supported, R/W split and sharding, seamless failover, Query cashing, rewriting, ...
SSL encryption not that good as in MaxScale, also no binlog router. Maxwell's Daemon allows to connect ProxySQL with Kafka.
http://vitess.io/ is also intersting, it allows to scale MySQL as well.
http://proxysql.com/compare shows comparison with other tools.
More info: https://fosdem.org/2017/schedule/event/mysql_proxy_war/

RocksDB rocks even in MariaDB

Last two talks were dedicated to the new engine for MySQL/MariaDB, built on top of RocksDB, called MyRocks. It provides better write efficiency, good enough read, best space efficiency, effective with SSD. In the Facebook testing for example, it uses 50% of spaced used by compressed innodb, or 25% of space used by uncompressed InnoDB, and only 10% of disk operations.
Performance problems are still not fully fixed, but goal for performance is to be as good as InnoDB. As for stability, what can be better prove that it is already stable, than the fact, that it is used already in Facebook for storing user data in production.
Goal is to integrate to MariaDB and Percona upstreams and expand features in the new engine.
It all is more visible when using small rows, because the per-entry overhead is very small comparing to innodb.
Bloom filter helps when non-existent IDs are read (index tree does not need to be read).

Currently it is hard to debug, has issues when using long range scans or when using large transactions or group commit.
For performance check, Linkbench is used, trx time went from 6ms to 1ms when writing, it went worse for reading a bit, but saving time during writes we have more for reads.
More info: https://fosdem.org/2017/schedule/event/myrocksdb/, https://fosdem.org/2017/schedule/event/myrocks_at_fb/

Really big data in many dimensions

Peter Bauman, a database expert from Jacob's University talked about datacubes on steroids with ISO Array SQL. Their long term research helped adding multidimensional arrays to SQL standard. Big Data can be structuralized in many ways, sets, trees, graphs or arrays. The idea of arrays in SQL is similar to other languages -- a column can simply be composed of more values. That should get to the standard later this year and it not only allows to access particular values, but also do other complicated operations, like matrix multiplication, histograms, and other operations that can be composed from those. Yeah, a real algebra in relational databases.
The research can be seen in practice on planetserver.eu, which includes 20TB of data. Generally we must count that the data will be bigger than memory if we speak about BigData..
Distribution of data gets crazy, since they not only fetch data from datacenters far away, but also want to put a database to the sattelite, and querying it from Earth.
They don't use hadoop, because it doesn't know about arrays, that might be TB long (which is the case when working with bit bitmaps for example).
Comparing their Rasdaman, Sparc and Hive, when getting more than one pixel, it didn't look pretty well for Hive and especally not for Sparc.
As for number of dimensions they look at several types of data -- it begain with 1D from sensors, and they went to 4D when modelling athmosphere.
More info: https://fosdem.org/2017/schedule/event/datacubes/

Live Patching of Xen and CI in Ubuntu

Ross Lagerwall from Citrix XenServer talked about Xen and live patching, which is needed because live migration is too slow and makes host unusable during that time. Xen looks at whether a payload needs to be applied when it is possible, so no patched code is on the stack. What they do then is putting jump istruction to the beginning of the function while stopping IRQs and WP.
The payloads are created by kpatch tool from fully build Xen and running some kind of diff tool, while picking just the changed functions. When applying, some non-static data can be handled, while touching initialized data is prevented. Definitely worth watching.
More info: https://fosdem.org/2017/schedule/event/iaas_livepatxen/

Martin Pitt talked about Ubuntu testing infrastructure. Developers must be responsible for writing tests, if the testing should be done in meaningful way.
QE should be responsible for keeping infrastructure working and for consultancy.
In the end, when the testing infrastructure really helps is the upstream testing, where test results are added for every pull request (systemd shown as example).
More info: https://fosdem.org/2017/schedule/event/distribution_ci/

So that was the first day of the Fosdem 2017.

Few notes from pre-Fosdem CentOS Dojo 2017

As it is already a tradition, Friday before Fosdem there is a meet-up of CentOS folks in Brussels. This time it was funny because we were in the same venue as PgDay was held in. Anyway, it was again great to talk to people and see progress in CentOS ecosystem (I really cannot call it system any more).

How it was with CentOS and Red Hat

The day with two parallel trackes was started by Karanbir, the Mr. CentOS. He talked about recent history, when first talk with people from Red Hat happened, where they talked about joining forces for the first time. That gave a great insight at how the relation looks now, which I think was really important, since especially active contributors need to understand this unique relation.

And what the relation is? Obviously, CentOS wouldn't be here without RHEL, but on the other hand, RHEL benefits from CentOS, which is not a downstream any more. Many projects like Gluster, OpenStack or Software Collections provide place for wider community to develop tools that are simply not good fit for Fedora. We can call it upstream or just community projects, important is that it works.

Then he gave several numbers, when all guesses from audience were mostly wront (lower) -- since numbers about CentOS base image downloads (tens millions) or current size of the mirror (254GB from what 150GB is not upstream RHEL) is simply astonishing.

On the negative part of the present state, CentOS folks (the official guys paid by RH for working FT on CentOS) lack resources, people resources, but with big community it still works amazingly, thanks to collaboration.

Now, what next? It's good to understand, that CentOS still can make easier everything you need to do with CentOS -- may it be modularity related, OpenStack, or NodeJS related. What is for example happening is that NodeJS master branch is tested in CI CentOS to see whether it works correctly on CentOS.

Long term they want to have CI that is available for everybody in opensource, and generally focus beyond CentOS ecosystem.

What EL6 user should pay attention in EL7, CI and really big data in Cern

Next talk by Bert Van Vreckem was more targeted to users of el6, who recently moved or decided to move soon to el7. He presented several commands any admin should know about. Exploring his github repo I found also nice collection of articles about writing readable and kinda safe Bash scripts, so I recommend checking https://t.co/F1JkYaOMQV.

Brian Stinson, the main CentOS CI guy, shared some basic info about current CI abilities, while it's more than awesome that many open-source projects, even outside of CentOS ecosystem can levarage this cool thing . It's especially cool because we do not speak about VMs, but about bare metal machines, that one can connect to and get a unique testing infrastructure for free. For the future we might find more architectures -- currently there is quite big demand for ppc64le and 64bit arm, which seems to be the future of the cloud soon.

The last talk of that day I attended was by David Abdurachmanov from Cern, who shared crazy numbers from their practice. They need to process huge amount of data that are often created in less than a second. 350PB data on disk, 800PB carried over net, 650k intel cores or 300Gbps network, that is the reality that helps humans in exploring the smallest particles in the Universe.

I hope at least slides will be soon available at https://wiki.centos.org/Events/Dojo/Brussels2017, but if not, I guess speakers will be more than happy to share them if you contact them. See you next year!

Tuesday, February 23, 2016

Notes from Prague PostgreSQL Developer Day

What a surprise that I haven't heard about containers at all during whole conference.. Really a special thing today, database experts are probably quite reluctant to such cool hype technologies... Anyway, it was very interesting 1-day conference around PostgreSQL, so sharing my notes, since those might be interesting for others as starting point for further study. Most of the slides are here: http://p2d2.cz/rocnik-2016/prednasky

Hlavní novinky v PostgreSQL 9.5

First presentation by Pavel Stehule, last time replacement for Tomas Vondra, was about news in PostgreSQL 9.5 and even without slides it was very interesting summary with enough details at the same time. PostgreSQL 9.5 turns to be wanted by quite a lot of attendees.
Most of main PostgreSQL upstream developers do it as full time jobs, which means it is a professional development, even though it is still true open-source and always will be. On the other hand, as the project is more and more mature, a lot of features are in development for even 3 years.
As for version 9.5, many users are interested in UPSERT, performance is not that different from previous versions. UPSERT is kind of like MERGE from Ansi SQL, but MERGE was not implemented 100%, so it is called differently.
Grouping sets kind of allow to create multidimensional views on data.
Locks reworked a lot and arrays in PLPGSQL mean that for some complicated scenarios it may improve performance a lot. Usually, sort with C lang is much quicker than with non-C lang. Languages are simply sorted slowly, in 9.5 it is much better even for language sorting.
Bitmap indexes with a lot of duplicits do not exist like in Oracle. Something new was created though, originally called minmax, now BRIN index, which is much smaller than other indexes and is primarily used for append only data.
Extension pg_rewind avoids cloning master and we can make a slave from master that has even a bit old data during fail-over.
Two things where PostgreSQL is behind commercial DBMS are multiprocessors utilizing and partitioning. Today the HDD is not the bottleneck anymore, with RAM disks and many CPUs it is speed of on CPU. In 9.6 there is already couple of commits for parallel queries, which is so great that we might even call 9.6 to be 10.0. There are also plans to have logical or BDR replication.. some more patches for making guesses better, using correlations. Especially useful for users who do not understand query execution and cannot use EXPLAIN.
Another feature is RRS (row right security), especially interesting for banking sector for example.

IO in Postgres - Architecture, Tuning, Problems

Andres from CitusData talked about tuning, giving overview around memory architecture, explaining why there is shared memory and private per process memory. One cache handled by PostgreSQL, another hash is in the system when reading files. Clock sweep invalidates part of the buffer so we know what buffer part we can replaced -- that might be expensive in some cases.
Reason why WAL is efficient is because it is a sequential write. Checkpoint then write bunch of data at some point, that can be expensive. We can extend checkpoint timeout, for large data written, checkpoint_segments should be tuned up. To reduce starting up we can do checkpoint before shutdown.
Checkpoints explain why there is sometimes more load and sometimes less in benchmarks graphs. Andres also showed how the tuning of buffer size and checkpoint values influence the graphs.
Interesting that turning off caching in OS may do better job, although we need to count with consequences.
Tuning buffers is basically about understanding the workload, no advice can be good enough if it is done without that. Drop index and re-indexing can be better with lower shared_buffer settings.
Batch runs may use different settings. Slides available at anarazel.de/talks/pgdevday-prague-2016-02-18/io.pdf

SQL Tabs

Sasha from Shards.io was speaking about his cool project SQL tabs .
Sometimes psql console is not enough, but people love it. So he started a new SQL client with requirements to be nonblocking, black and white themed, having connections switching...
react.js, nodejs, and other technologies used for creating web-based system that allows browsing, datatype exploration, context advisor, functions listing, query history, ... It uses libpq in backend, so the output is as usual. It can also generate some simple graphs by adding some semantic in SQL comments and selecting proper type of chart and writing comments in markdown. It also allows to combine more queries and having a simple report with text, chars, table.
It understands some datatypes so time series are nicely formated as one would expect.
More info at www.sqltabs.com.

Jak jsme začali provozovat PostgreSQL (a co jsme se u toho naučili)

Ales Zeleny from Ceska Sporitelna sharing experiences with PostgreSQL. He advised:
  • to try backup before going to production
  • start with simple app for lunches
  • not making oracle from PostgreSQL
  • think about what to log because audit is missing in PostgreSQL
  • creating schema for app instead of using public schema
  • separating app modules to own schemas
  • sometimes even separate databases
  • Table spaces on own clusters to avoid influencing other databases when character of one app changes rapidly
  • Monitoring is important but sometimes troublesome
  • Use check_postgres.pl and autovacuum, configured well for the workload
  • Default privileges help creating fine granulary permissions
  • Logging can supplement audit
  • Backup and recovery by barman, pgbackrest, pacemaker from rh for recovery, streaming replication with delay
  • Testing data recovery scenery
  • For monitoring they use elegraph + influx + grafana
  • Visualization helps understand where is problem
  • Configure autovacuum so that it runs often, then it does little things, so it is quickly

Open-source "clusterovací" řešení pro PostgreSQL

Petr Jelínek from 2ndQuadrant talked about general scaling concepts (vertical, horizontal), focusing on OLTP, OLAP, ...
For vertical scaling and OLTP we have couple of features already in 9.5 (sampling, BRIN indexes, sort optimizations and in 9.6 there will be some first implementation of parallel query execution.
Hot standby shorty introduced - integrated solution for scaling, it solves only read scaling, since slaves are read only.
PL/Proxy - now almost legacy, developed in Skype, no maintanance, map-reduce implementation in PostgreSQL.
Greenplum by Pivotal (fork of PostgreSQL), stable, open-source since Autumn 2015 (Apache) is kind of MPP database (multi parallel processing) and is quite a lot diverse from vanilla PostgreSQL. It has own tooling, tables optimized for inserts.
CitusData is gonna release CitusDB very soon as extension only, now it is a fork of PostgreSQL and extensions. And it will be open-source soon as well, now only pg_shard is open-sourced. It has logical servers, so it duplicates data in distributed tables.
Postgres-XL comming from Postgres-XC and StormDB, now Open-Source (MPL) taken by community after the company crashed. Soon on 9.5. It is MPP that supports both OLTP and OLAP, but is more complex for administration and installation.
BDR from 2ndQuadrant is also a fork of PostgreSQL (9.4), but it has goal to be integrated back once. It already pushed couple of features to vanilla PostgreSQL (background workers, event triggers, ...). It is asynchronous multi-master solution (all data on all servers) that uses optimistic detection and conflict solutions (in contradiction to MVCC in vanilla PostgreSQL). So, it makes the consistancy be true eventually, after conflict resolution (after commit). Primary focusses on latency optimization (avoid global writes across the universe).
Questions about eventual consistency and sequences; answer was that nobody can expect normal application will work the same as before without multi-master.

Monitoring PostgreSQL prakticky

Pavel Stehule talking about administration and monitoring, starting with interesting thought that moving from "It may work" to "I know it works" is quite a small step.. Architecture is always a key. Client server is for one server, cluster scales... Wrong chose in the beginning will affect all.
Already during user interface design to think about database -- so the database can return only limited set of lines. Important values are parameters of RAM, IO -- while we are in RAM, then there is performance degradation and it does not go linearly, it degradates by jumps.
Configuration of database and optimization of queries does not help when the architecture and app design is wrong.
Sometimes users add indexes without thinking. Sometimes users deploy replication and cluster when it is not necessary, ram may help much better, aggregated data in db is also not a bad idea.
Work_mem shared_buffers, effective_cache_size, max_connections are important - wm*mc*2 + sb + fs + os < ram
Database servers can never swap.
Shared buffers too high (over 20gb) could make finding free cache too slow, but it always depends on cache work characteristic
Effective chache size says how big portion of index size is actively used.
Strong negative feedback is about 80/20 rule, that 80% of workload is generated by 20% of queries. Sometimes even modules that are not used, eat 50% of workload.
People cannot work with memory today, it is better to read data once, work with them and write time to time to db, not connect to db for every piece.
Monitoring is important to seek for regressions.
Good queries should be verified that it is not an accident.
Disk in work has much different characteristic than on machine without load. Pg_bench is the other extreme, it does not simmulates any real workload.
Some overview at pg_stat table - Many Rollbacks are wrong, many tmp files are wrong. Changes in transactions count is suspicious, like firewall killing connections by mistake.
Kill very long running queries before they kill the server.
Especially in beginning of the app it is important to monitor queries because database is a living organism.
Queries around 50ms cannot be much faster, if we need more, we need cache and not use db at all..
Monitoring regressions in cpu utilization using profiling tools works good in linux, fix is usually to distribute the workload to more processors.
Autovacuum monitoring is also important.
And finally the bench-marking is needed and it should be done soon enough and ideally in production environment.
Finding whether something is effective is hard, takes long time. Finding whether something behaves the same as yesterday is simple and quick. Graphs are quick way from hope to know.
Too wide tables are usually wrong, in oltp we should respect normal forms.
How large is typical oltp today? Usually tens of gb. Today people often store bitmaps to db which makes it larger.
Ram > 1/10 db

pg_paxos: Table replication through distributed consensus

Pg_paxos was introduced by Marco Slot and we learned it is a framework for distributed problems. Servers sometimes need to agree on something, that is what paxos algorithm is about.
Two phases: nodes agree in participation, proposal asks for acceptance when majority must agree
If anything fails, we start over.
Paxos state machine (multi-paxos) helps to replicate log on multiple servers.
Paxos is extension with low throughput and high latency, so no alternative to streaming or logical replication, not even a magic distributed postgres, but it is a useful building block for distributed systems. It is somewhat experimental in pl/pgSQL. Available on github on citusdata organization.
Demo showed how to use three paxos servers in cloud to acquire a lock, then communication can begin normally and we know only one server works with the data.

That's all folks.

Saturday, February 13, 2016

Notes and thoughts from Developer Conference 2016

This post is mostly intended to remind myself in the future, what DevConf 2016 looked like for me, but maybe somebody else find it useful as well.
Many of the talks and workshops are now uploaded to the youtube, so feel free to find the full recording: https://www.youtube.com/playlist?list=PLjT7F8YwQhr--08z5smEEJ-m6-0aFjgOU

The keynote

Tim Burke started the DevConf 2016 by a keynote full of tips how to become a rock star in open-source world. From "not being a troll", through "not doing what you don't like", to be a real team player, because a rock star is not an individual. Passion was identified as a way how to enjoy your work. See the full keynote at https://www.youtube.com/watch?v=Jjuoj2Hz03A.

Docker versus systemd

Dan Walsh talked about systemd and docker integration. He mentioned pitfalls they have already solved and which they still need to solve. Dan himself staying in the middle of docker upstream and systemd guys, who both don't like accepting compromises. He mentioned these issues:
  • sd_notify that should eventually help notifying about app readiness
  • logs being grabbed by journald so the logs are available after container is removed
  • running systemd in docker container - all PRs closed by docker
  • with docker 1.10 it will work with some options specified during container start
  • machinectl, working with all VMs running on machine
  • eventually libcontainer will be hopefully replaced by its clone runc, from OCSpec

All Flavors of Bundling

Vit talked about practical examples of bundling he met during ruby maintenance work, mentioning bundler not only because it helps bundling stuff, but also because it bundles a lot of stuff. He mentioned that there might be some reasons to bundle, sometimes not. All that was triggered by recent change in fedora guidelines, that start to allow bundling. Vit also went through various specifics in different languages, like javascript, c++, go, ... interesting point about generators, he said that we basically bundle the code because if they make mistake, you get problem.
Q: some example of bundling that succeeded? Bundler project learned that bundling is the correct way and the only way.

Open source distributed systems at Uber by Marek Brysa:

HA is really crucial, because payment transactions and tracks are done by Uber.
More services than transporting people - food, stuff and even kittens
Technologies used: Ubuntu, Debian, docker, python, nodejs, go, kafka, redis, cassandra, hadoop
Every project meant to be open-sources by default, except exceptions
Also contributions
Umber of micro-services grew from 0 to 700 micro-services in last two years
Ringpop:
  • consistent hashing for sharding, membership protocol - SWIM membership protocol, using direct and indirect pings to get state of an instance, to prevent random network issues
  • Apparatus called gossiping says something about other instances when sending a message
  • Infection style of dissemination, currently 1k instances, 2.5k tested, in the future 10k?
  • App level middleware
TChannel:
  • Soa oriented replacement of http, which turned to be slow in some cases
  • Multiplexing, performance, diagnostic, HA forwarding
  • JSON and Thrift for serialization
Hyperbahn:
  • Service discovery and request forwarding
  • Give clients one hyperbahn instance and bootstraping starts automatically
  • Services are connected to ring of routers, every service is connected to few routers

CI/CD with Openshift and Jenkins:

It is not only about tools
Containers as cows, we replace one if dies.
Containers make people think about what happens when a container dies
Openshit CI/CD wants to be generalized to pipeline that may be used by other projects
Example of running Jenkins in OpenShift
S2I used as configuration tool
https://github.com/arilivigni/openshift-ci-pipeline  - 3.3 openshift release roadmap

Is it hard to build a docker image?

Tomas asked and had also answer that it is..
Squashing, cache, secrets, adding only (metadata), usage message, evolution is rapid
Conclusion is that docker is young

Remi's PHP 7:

Reason for skipping 6 was existence of books about development version 6
New API brings binary incompatibility for binary extensions
Change in size_t and int64 only on windows
Abstract syntax tree, native TLS, expectations (assert finally usable), throwable interface
Extensions porting process still in the middle, some won't ported at all, MongoDB for instance instead of mongo
Performance increased twice for common applications, comparing number of pages served
Scalar types possible to be defined in functions declaration, strict_types option makes strong typed language from php
We now can catch parse and type errors, with keeping backward compatibility of exceptions still working
Removed extentions, change in expressions containing variable names in other variables
Fedora will eventually drop incompatible extensions
We need scls in fedora, that would be the best thing for php7

Security: Everything is on fire!

Josh Bressers talking about security and whether the situation is really that desperate. It is not yet, but there is work to be done. What is happening is people earn real money on security issues, press makes money on newspaper selling, so they make up things to sound interesting.
Where do we start? Communication is key. Security guys should listen.
Security is not here for solving problems, it is part of the problem.

Centos pipeline meet-up:

Couple of people in one room had an initial chat about existing components that should be usable for CentOS Container Pipeline and decided to use Openshift, which sounds like good way to go because it includes already the missing pieces.

Fedora and RHEL:

Denise was speaking about how fedora is important for Red Hat.
Matt then presented a lot of graphs about downloads stats of fedora from various views. The impression was that it is not that bad.

Changing the releng landscape

Denise Gilmore about releng in fedora:
Koji 2.0 still in the beginning, should be build with copr backend somehow, to allow more flexible builds
Et/bodhi alligment
Rpmdiff and license scanning done internally shoud be done in fedora as well.

Re-thinking Linux Distributions by Langdon:

Rings concept did not work
It was too complicated when working out the details
Modularity should work better
Think about applications, not packages sets
We need minimize dependencies on other applications and on OS
Give separate channel with metadata, that's what rpms were invented for
Atomic app, nulecule, rolekit, xdg-app mentioned as way
E&s is where the definition should take place, not necessarily place to code it
Q: will 10 versions of library do a mess in systems? Let's make computers track that for users

Q&A with council:

included question that cannot be missed in any similar session - fedora and proprietary drivers. Josh mentioned that the problem is not only getting the drivers installed, but also not breaking the whole video once kernel is updated. Everybody understands the politic cannot be easily changed, but atleast the problem with breaking the video might be better soon. Another question questioned matt's graphs, there was a question about possible kerberos inclusion instead of generating certificates on server, where there is btw a privat key, which doesn't belong there. Generally the session was very positive.

Closing quiz:

The last session, the quiz, which full room was participating in, was funny and interesting end of the conference.

Fosdem 2016 notes and thoughts

This post is mostly intented to remind myself in the future, what fosdem 2016 looked like for me, but maybe somebody else find it useful as well.

systemd and Where We Want to Take the Basic Linux Userspace in 2016


In the first keynote, about systemd, Lennart was speaking mostly about stuff that will happen in 2016 in systemd and particularly dnssec. He began by introducing recent move of the project to github and introduction of CI. Systemd is default in almost all distros, with no many rants around.. "Are we boring now?", he asked.
Then he went through networkd, mentioning it's much more automatic and smaller than NetworkManager, nspawn to be different from Docker because nspawn is not only for microservices. sd-dhcp now used also by NetworkManager, although it's stil not public nor a supported component yet.
Unified control group hierarchy fixes issues that the old nonunified implementation had, but the new one is not used much yet, because API has changed and it will break stuff.
DNS resolving, previously done by glibc, is going to be centralized now in systemd-resolved. Caching will make it better but big thing is to have dnssec ready.
The dnssec in systemd-resolved checks just integrity, no confidentiality. Chain of trust coming from TLD zones bellow should eventually check services like ssh, tls, pgp, certificates. However, only 2% of most popular websites use dnssec, eventhough main TLDs were signed in2010 already. Interestingly, 8.8.8.8, a popular Google's DNS server does.
Validation on client is not done though, because it's slower and clients do not support it. It is important for ipv6 though.
Private networks are problem because those cannot be validated from the top level, since the sites do not exist officialy. Another problem is that ISP's DNS servers and end-user's routers are crap.
Systemd's approach is to give up when dnssec cannot be verified, which opens vulnerability, yes. But when it is successful, clients are reported and pgp, tls, ssh or others may be happy, so full dnssec enabled internet is not a blocker.
Question in the end asked about Lennart's previous plans to change the way how applications are distributed and whether this is still the plan. Answer was that most of needed technology is done, but distributions have not adopted it yet.

Distributions from the view of a package

Colin Charles from MariaDB talked about packaging. He gave us an overview of whole MySQL ecosystem, mentioning that MySQL  has been always more open-source product, not a project, which I agree.
He mentioned the FOSS exception that MySQL may be distributed under GPL and proprietary license at the same time.
Fedora was often mentioned as example of distributions that care about packages quality, mentioning some particular mailing list threads, bugs and we several times saw Tom Lane's name there, since he used to maintain MySQL for long time.
Software Collections also mentioned as solution for differences between support period of distributions and MySQL.
Some statistics showed that default version in long-term support distributions is important (RHEL and MariaDB).
Docker and juju shows stats, fedora and OpenSUSE used to do that as well but do not anymore, although it would be good for upstream to prioritize their resources.
Some platform issues were mentioned, like bundling vs. not including some extensions.
He mentioned that they care about being informed about distribution bugs in MariaDB upstream, even be on CC sometimes. He mentioned how many downstream fixes are in fedora and elsewhere, and that this needs to be fixed.

MySQL Group Replication or how good theory gets into better practice

Tiago Jorge talked about project still in MySQL labs (labs.mysql.com), Group Replication is supposed to remove fuzz from users when dealing with replication fail-over. Group communication primitives concept from early 90s was inspiration for group replication.. The process includes a certification procedure, which is about asking other nodes whether they do not work with the data we want to work with by any chance.
Group replication needs some special options, like binlog, GTID, algorithm for something and we need a replication user. Set up requires UUID for specification (because this will be used by GTID instead of server ID) of groupname and specification  of some nodes addresses, not necessarily all of them. Performance schema can say how many and which members are there, and it also includes various stats, like transactions stats, ... rejoining a member can use sync method that only applies what is not yet done. Alos, catch-up process does not block donor.
Main features: Cloud friendly, integrated, automated, no big overhead, ..
More info in mysqlhighavailability.com, like this one: http://mysqlhighavailability.com/mysql-group-replication-hello-world/
No support for online ddl though.
Question: Does it support thousand of nodes? Answer: There is a linear degradation because of communication limits.
Btw. It is type of eventual consistency.

ANALYZE for statements: MariaDB's new tool for diagnosing the optimizer

Sergei Petrunia talked about explain and optimize statements. Query plan sometimes does not correspond with reality and we need to show stats, like row stats, especially stats per table. MySQL's ANALYSE is inspired by EXPLAIN ANALYZE from PostgreSQL and Oracle, and in comparison to EXPLAIN alone, it not only optimizes the statement, but also executes it and shows some stats. It shows comparison between rows guessed and real rows count read. A general use case is that big discrepancies are worth investigating.
Filtered fields can tell how many rows were read and then discarded, which suggests to add indexes, which is always a trade-off between faster reads or faster writes.
ANALYZE statement can fix table's histogram that positively influences execution plan.
EXPLAIN may return json which might be read better, values prefixed with r_ are from ANALYZE part. We can see information like buffer sizes used or cache hit ratio as well.
An example showed by Sergei explained how to use that all to see which sub-queries caused the biggest performance issues.
ANALYZE works fine to show also range selects.
Current cons are that explain statement sometimes lies. ANALYZE may be much more right.
During Q&A, it was mentioned that histograms are better for selectivity than cost model, which is what MySQL uses them for.

Rolling out Global Transaction IDs at Dropbox

Rene Cannao spoke about experiences with GTID in Dropbox. Typically we have at least 2 slaves, Dropbox has two masters (one basically a slave of the original) and every master also has own slave.  Binary logs are important, classic approach uses file and position.
In case of non-GTID deployment, on slaves, the file names and positions are not same as on the master, which may be tricky especially when we have nested replication, because the second slave does not know where to start replication after the middle slave crashes. GTID consists of UUID of source and transaction ID and since it is the same across the all cluster, it makes the recovery easier -- the slaves simply know where to begin the recovery. Slave just uses master_auto_position =1.
The use case Rene showed, utilized also enforce_gtid_consistency, log_slave..., gtid, binlog... options.
He went through procedure of enabling GTID either offline (requires to turn off the cluster for some time) or online (without restarting all servers at once).
Sometimes there might be problem with zoombie slaves, time-outing.

MariaDB CONNECT Storage Engine: Simplify heterogeneous data access

Serge Frezefond talking about Connect plugin in MariaDB, that can access data from other sources. Features like ability to specify options when creating a table of type connect or auto discover structure of the file makes this plugin quite easily usable. With table type ODBC, we can use syntax like PRIOR statement that does not exists in MySQL. It also allows to create a triger that runs some command on an entry execution.
We can also query different databases at the same time, like Oracle DB and PostgreSQL. The XML option supports xpath syntax to describtion of which tag corresponds with which column.
JSON does not replace integrated JSON support (already in MySQL, comming soon in MariaDB) but can add extrrnal JSON source into the db.. The structure of JSON needs to be addressed, and it is done by setting a starting point for the table. Jpath spec is used for aggregation..
Most of the stuff are writable also, but the performance is not always perfect.

Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes

Raghavendra Prabhu from Yelp was talking about orchestrating. K8s has some requirements, like server-client design, cattle, not pet approach, horizontal scaling rather that vertical, statelessness in databases and elastic scalability, like scaling both directions. We should also switch to declarative rather than imperative thinking.
There are a lot of container technologies available. Galera shortly introduced, that it uses optimistic approach when solving concurency conflicts and supports automatic node sync. CAP theorem for galera is true for CP.
Stateless DBs, problem for big data.
Kubernetes shortly introduced.
PaaSTA supports a lot of existing tools and technologies.
Galera node is equivalent with pod.
Examples of k8s configuration files were shown.

Programming a Board Game

A physical board game created by Chris Ward should be available freely in next months. CMS from start, doc in Markdown, web made by Jeckyl. Also pdf generation should be done at once with web, pandoc used for it. Latex also used to design cards, someone said it is like css for printing.
So far no big graphic, ImageMagic used for something.
Question asked, whether it should be a template for others. Answer was yes, eventhough there might be work done to make this ready..

MySQL operations in Docker: A quick guide for the uninitiated

Giuseppe Maxia talking about docker and MySQL. Services on shared resources is not good. We pay for VMs. Container is visualization system but is no VM. Several things to keep in mind when working with containers carrying databases. Never modify running container and deploy ready made containers only. We can change options by passing them to cmd.. another way is use config server by bind mounting a .cnf file. bit.ly/my-rep-samples.
Question about secret passwd, Guisepe suggests passwd from file.

Introducing new SQL syntax and improving performance with preparse Query Rewrite Plugins

Sveta Smirnova talked about features in MySQL users asked for and how oracle is well responsive to user's requests. For example after last years' Markus Winand's talk, who mentioned that MySQL does not include FILTER statement. She was able too add it using rewrite query engine. Then she took a bit closer look at how this was implemented, really showing C code. It showed how to work with MySQL's memory management. The filter query is rewritten by regular expression to CASE WHEN syntax. The new rewritten query is stored and some flag must be set.
Another example showed how comments in query might be rewritten to three queries -- setting a variable from comment and resetting it after the query is executed. This included usage of THD structure which allows to change variables quite easily.
Then, she showed how to add some new variables and she just mentioned it is possible to do much more like locks, or implement even httpd server inside mysql, etc..

Do Software Collections still matter?: With containers, unikernels, and all the new hotness?

Langdon talked about Software Collections in world of unikernels and containers. Answer was simply yes. Problem is breaking apps to microservices. You might want multiple versions of same components inside container as well. SCL bring you simpler packaging, than if getting components by hand. Good for migration or for logical separation of similar components.
Q: was it developed for OpenShift? Not only.
Q: SCLs in fedora? Yes, it makes sense, there are same reasons as in RHEL, but from opposite point of view, we need older versions or versions that are in RHEL.
Q: Why not use just dockerfiles? With rpm as middle step it is easier to verify, that packages work as expected.

Reproducible and Customizable Deployments with GNU Guix: Why "app bundles" get it wrong

Ludovic Courten talking about GNU Guix, because keeping SW environment polished is not easy. Distro is stateful system. Is it hopeless? Someone puts whole distro into application container as well. Using layers in docker is like using frozen pizza for cooking own flavored pizza.
Functional package management might be answer. We have complete image of how packages are composed. Resulting path where padkage is placed into, is hash of all deps.
We can resolve all deps exactly.
We can install things without root.
Intallation and removal of packages may be done in one transaction with Guix.
Every transaction creates a generation, like a named state where we can go back to.
Additional package manager, but this one is cool.
Gimp deps is a big mess, not easy. Search path is huge, like really huge.
There is also guixos, similar to nixos, that boots quite nicely from generated file.
Recipy is similar to RPM spec but is functional, basic skeleton might be stolen from Nix, but there are differences.
 

Sunday?

There are no other notes from Sunday, because my plane was leaving too early, but still this year's Fosdem was great, year to year better and better. Can't wait for next year already.