_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML What is a database transaction?
rmunn wrote 14 hours 23 min ago:
One way to think about transactions, as I wrote in an earlier comment,
would be to think of them as being like snapshots in a copy-on-write
filesystem like btrfs or zfs. But another way to think of them is being
like Git branches.
When you BEGIN a transaction, you're creating a branch in Git. Everyone
else continues to work on the master branch, perhaps making their own
branches (transactions) off of it while you're working. Every UPDATE
command you run inside the transaction is a commit pushed to your
branch. If you do a ROLLBACK, then you're deleting the branch unmerged,
and its changes will be discarded without ever ending up in the master
branch. But if you instead do a COMMIT, then that's a `git merge`
command, and your changes will be merged into the master branch. If
they merge cleanly, then all is well. If they do NOT merge cleanly,
because someone else merged their own branch (committed their own
transaction) that touched the same files that you touched (updated rows
in the same table), then the DB will go through the file line by line
(go through the table row by row) to try to get a clean merge. If it
can successfully merge both changes without conflict, great. If it
can't, then what happens depends on the transaction settings you chose.
You can, when you start the transaction, tell the DB "If this doesn't
merge cleanly, roll it back". Or you can say "If this doesn't merge
cleanly, I don't care, just make sure it gets merged and I don't care
if the conflict resolution ends up picking the "wrong" value, because
for my use case there is no wrong value." This is like using "READ
UNCOMMITTED" vs "SERIALIZABLE" transaction settings (isolation levels):
you would use "READ UNCOMMITTED" if you don't care about merge
conflicts in this particular table, and just want a quick merge. You
would use "SERIALIZABLE" for tables with data that must, MUST, be
correct, e.g. account balances. And there are two more levels in
between for subtle differences in your use case's requirements.
As with my previous comment, this is probably obvious to 98.5% of
people here. But maybe it'll help someone get that "ah-ha!" moment and
understand transactions better.
shalabhc wrote 21 hours 26 min ago:
For all interested in this topic, I highly recommend the book Designing
Data Intensive Applications [1] .
It goes into not only different isolation levels, but also some
ambiguity in the traditional ACID definition.
I believe a 2nd edition is imminent.
HTML [1]: https://www.goodreads.com/book/show/23463279-designing-data-in...
vismit2000 wrote 11 hours 13 min ago:
Second edition is available now:
HTML [1]: https://www.oreilly.com/library/view/designing-data-intensiv...
interlocutor wrote 1 day ago:
A lot of database tools these days prioritize instant sharing of
updates over transactions and ACID properties. Example: Airtable. As
soon as you update a field the update shows up on your coworkers screen
who also has the same table open. The downside of this is that Airtable
doesn't do transactions. And the downside of not doing transactions is
potentially dangerous data inconsistencies. More about that here:
HTML [1]: https://visualdb.com/blog/concurrencycontrol/
gtowey wrote 15 hours 39 min ago:
Not so subtle product promo while taking a swipe at your competition.
zzzeek wrote 1 day ago:
I think this is a great post to have but I'm going to make a critical
usability suggestion:
* the videos should have "pause" and a "step at a time" control *
Even at the "half speed", without a deep knowledge of the context, the
videos move way too fast for me to read the syntax that's invoking and
line it up with the data on the left side. I (and im definitely not
the only one) need to be able to sit on one step and stare at the whole
thing without the latent anxiety of the state changing before I've had
a chance to grok the whole thing.
this has nothing to do with familiarity with the concepts (read my
profile). I literally need time to read all the words and connect
them together mentally (ooh, just noticed this is pseudo-SQL syntax
also, e.g. "select id=4", that probably added some load for me) without
worrying they're going to change before watching things move.
please add a step-at-a-time button!
bddicken wrote 23 hours 47 min ago:
I appreciate this feedback, and then you read through it with enough
rigor to notice.
johanyc wrote 1 hour 45 min ago:
I second it. At the very least a pause button is needed.
MHordecki wrote 1 day ago:
Iâve found this article lacking. Like some other articles in this
space, it introduces isolation levels through the lens of the phenomena
described in the SQL standard, but I find that thereâs a different,
more intuitive approach.
I think itâs more tractable to define this problem space starting
from the concept of (strict) serializability, which is really a
generalization of the concept of thread safety. Every software engineer
has an intuitive understanding of it. Lack of serializability can lead
to execution-dependent behavior, which usually results in
hard-to-diagnose bugs. Thus, all systems should strive towards
serializability, and the database can be a tool in achieving it.
Various non-serializable levels of database transaction isolation are
relaxations of the serializability guarantee, where the database no
longer enforces the guarantee and itâs up to the database user to
ensure it through other means.
The isolation phenomena are a useful tool for visualizing various
corner cases of non-serializability, but they are not inherently tied
to it. It's possible to achieve serializability while observing all of
the SQL phenomena. For example, a Kubernetes cluster with
carefully-written controllers can be serializable.
lateforwork wrote 1 day ago:
Most RDBMSs offer serializable isolation if you need it. Often you
don't need it. The downside of using serializable isolation
unnecessarily is reduced concurrency and throughput due to increased
coordination between transactions.
bddicken wrote 1 day ago:
Author here. This is good feedback.
The combination of transactions, isolation levels, and MVCC is such a
huge undertaking to cover all at once, specially when comparing how
it's done across multiple DBs which I attempted here. Always a
balance between technical depth, accessibility to people with less
experience, and not letting it turn into an hour-long read.
jaxr wrote 1 day ago:
I love the work planetscale does on keeping this type of content
accurate yet accessible. Keep it up!
Rapzid wrote 1 day ago:
[1] More notation, more citations, more better.
HTML [1]: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster
peterclary wrote 1 day ago:
Looks like the author is geoblocking in protest of the UK Online
Safety Act (and fair enough).
bddicken wrote 1 day ago:
Notation is useful. Citations are nice for further reading. But I
don't agree more of this makes for a better article!
mika6996 wrote 1 day ago:
Then recommend a better explanation?
rishabhaiover wrote 1 day ago:
It's an absolute pleasure reading planetscale blogs. I'm curious about
what tool is used to make these visualizations?
bddicken wrote 1 day ago:
Author here. Thank you! These visuals are built with js + gsap ( [1]
)
HTML [1]: https://gsap.com
DIR <- back to front page