_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Read Locks Are Not Your Friends
alecco wrote 2 hours 30 min ago:
It always amazes me the amount these folks are willing to work and
struggle just to avoid reading basic database literature.
Also Fedor Pikus has some nice cppcon talks from years ago on all this.
Very low level.
sevensor wrote 3 hours 55 min ago:
Does this apply also to std::shared_mutex in C++? This is a timely
article if so; Iâm in the middle of doing some C++ multithreading
that relies on a shared_mutex. I have some measuring to do.
gpderetta wrote 3 hours 40 min ago:
mostly yes.
sevensor wrote 2 hours 59 min ago:
Thanks, thatâs what I was afraid of. The ping ponging described
in the article seems hard to avoid regardless of what language
youâre using.
api wrote 4 hours 9 min ago:
Take a look at crates like arc_swap if you have a read often write
rarely lock case. You can easily implement the RCU pattern. Just be
sure to read about how to use RCU properly.
Well done this pattern gives you nearly free reads and cheap writes,
sometimes cheaper than a lock.
For frequent writes a good RWLock is often better since RCU can degrade
rapidly and badly under write contention.
ot wrote 4 hours 20 min ago:
This is drawing broad conclusions from a specific RW mutex
implementation. Other implementations adopt techniques to make the
readers scale linearly in the read-mostly case by using per-core state
(the drawback is that write locks need to scan it).
One example is folly::SharedMutex, which is very battle-tested: [1]
There are more sophisticated techniques such as RCU or hazard pointers
that make synchronization overhead almost negligible for readers, but
they generally require to design the algorithms around them and are not
drop-in replacements for a simple mutex, so a good RW mutex
implementation is a reasonable default.
HTML [1]: https://uvdn7.github.io/shared-mutex/
amluto wrote 54 min ago:
Wow, folly::SharedMutex is quite an example of design tradeoffs. I
wonder what application the authors wanted it for where using a
global array was better than a per-mutex array.
Jyaif wrote 3 hours 54 min ago:
And a Rust equivalent of folly::SharedMutex:
HTML [1]: https://docs.rs/crossbeam-utils/latest/crossbeam_utils/sync/...
mike_hearn wrote 4 hours 3 min ago:
Right, and if you're on the JVM you have access to things like
ConcurrentHashMap which is lock free.
PaulHoule wrote 4 hours 7 min ago:
I think itâs not unusual that reader-writer locks, even if well
implemented, get in places where there are so many readers stacked up
that writers never get to get a turn or 1 writer winds up holding up
N readers which is not so scalable as you increase N.
Retr0id wrote 4 hours 28 min ago:
claudes love to talk about The Hardware Reality
stuaxo wrote 4 hours 22 min ago:
"The performance Death Spiral" was the point I realised I was being
LLMd and bailed out.
amluto wrote 4 hours 30 min ago:
The code examples are confusing. The show the code that takes the
locks, but they donât show any of the data structures involved. The
rwlock variant clones the Arc (makes sense), but the mutex variant does
not (is it hidden inside inner.get)?
In any case, optimizing this well would require a lot more knowledge of
whatâs going on under the hood. What are the keys? Can the entire
map be split into several maps? Can a reader hold the rwlock across
multiple lookups? Is a data structure using something like RCU an
option?
_dky wrote 4 hours 41 min ago:
If implementation is task based and task always runs on same virtual
CPU (slots equaling CPUs or parallelism), wonder if something like
below might help.
RW lock could be implemented using an array of length equal to slots
and proper padding to ensure each slot is in its own face line (avoid
invalidating CPU cache when different slot is read/written).
For read lock: Each task acquires the lock for their slot.
For write lock: Acquire lock from left most slot to right. Writes can
starve readers when they block on in-flight reader at a different slot
when moving from left to right.
I do not know how Rust RW locks are implemented.
whizzter wrote 4 hours 50 min ago:
I'd be super interested in how this compares between cpu architectures,
is there an optimization in Apple silicon that makes this bad while
it'd fly on Intel/AMD cpus?
PunchyHamster wrote 4 hours 2 min ago:
Read lock requires communication between cores. It just can't scale
with CPU count
gpderetta wrote 4 hours 11 min ago:
the behaviour is quite typical for any MESI style cache coherence
system (i.e. most if not all of them).
A specific microarchitecture might alleviate this a bit with lower
latency cross-core communication, but the solution (using a single
naive RW lock to protect the cache) is inherently non-scalable.
hansvm wrote 4 hours 40 min ago:
I've observed the same behavior on AMD and Intel at $WORK. Our
solution (ideal for us, reads happening roughly 1B times more often
than writes) was to pessimize writes in favour of reads and add some
per-thread state to prevent cache line sharing.
We also tossed in an A/B system, so reads aren't delayed even while
writes are happening; they just get stale data (also fine for our
purposes).
the_duke wrote 3 hours 54 min ago:
Rust has an interesting crate for this, arc-swap [1].
It's essentially just an atomic pointer that can be swapped out.
HTML [1]: https://docs.rs/arc-swap/latest/arc_swap/
DIR <- back to front page