commented:
I like this information, thank you. Suggest that each section could do
with a real world example.

  commented:
Does anyone know if there is a standard like punycode but for encoding
arbitrary bytes in unicode? Where some may be valid Unicode and some
not? I'm thinking of Pickled values in Python.

  commented:
Well you have URL encoding, where "weird" bytes are encoded as a
percent sign followed by two hex digits. That's probably the most
widespread.
For things which need to be valid symbol names (basically
[a-zA-Z0-9_]) I like to use a variation of using underscores for
escaping: any byte outside of that range, plus the underscore itself,
gets converted to an underscore followed by two hex digits. Always add
some prefix (maybe just an underscore, maybe something to "namespace"
symbols using your scheme) to ensure the output never starts with a
digit.
For higher efficiency on non-ASCII bytes, where human readability
doesn't matter, you have base64.

  commented:
There isn’t a standard for encoding arbitrary bytes in unicode but
there are hacks that implement the idea in various ways.
One I know of is base2048 used to pack programs into toots for the BBC
Micro bot and its companion Owlet editor. base2048 has an informative
rationale in its readme.
base2048 is designed according to the weird Twitter/Mastodon
per-character cost metrics. In other situations a different metric
might make sense, eg the number of bytes in a character’s UTF8 or
UTF16 encoding.
.