xmlparser, branch HEADXML parser
22dd4a4f48ad39da91869d56348538e063f3c7902025-12-11T19:48:19Z2025-12-11T19:48:19Zstricter numeric entity validation before parsingHiltjo Posthumahiltjo@codemadness.orgcommit 22dd4a4f48ad39da91869d56348538e063f3c790
parent 0acf4adc915dae0ca25cee374189a602f905cdce
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 11 Dec 2025 20:48:19 +0100
stricter numeric entity validation before parsing
0acf4adc915dae0ca25cee374189a602f905cdce2025-09-21T12:25:52Z2025-09-21T12:25:52Zupdate README text and bump LICENSEHiltjo Posthumahiltjo@codemadness.orgcommit 0acf4adc915dae0ca25cee374189a602f905cdce
parent 317a612f61bb72c4b67a0cb8f923c2d37e7bdcec
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 21 Sep 2025 14:25:52 +0200
update README text and bump LICENSE
317a612f61bb72c4b67a0cb8f923c2d37e7bdcec2025-09-21T12:24:55Z2025-09-21T12:24:55Zslightly reduce stack size for entitiesHiltjo Posthumahiltjo@codemadness.orgcommit 317a612f61bb72c4b67a0cb8f923c2d37e7bdcec
parent fc93b122349d4281c31fadf30360c2a5b6c235c9
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 21 Sep 2025 14:24:55 +0200
slightly reduce stack size for entities
fc93b122349d4281c31fadf30360c2a5b6c235c92024-06-30T08:20:40Z2024-06-30T08:20:40Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit fc93b122349d4281c31fadf30360c2a5b6c235c9
parent 5f875a2e8c6ea50252807924400c7d125505f206
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 30 Jun 2024 10:20:40 +0200
bump LICENSE year
5f875a2e8c6ea50252807924400c7d125505f2062024-06-30T08:01:07Z2024-06-30T08:01:07Zimprove parsing whitespace after end tag namesHiltjo Posthumahiltjo@codemadness.orgcommit 5f875a2e8c6ea50252807924400c7d125505f206
parent 879fe9b0203550755f7b70d8c0061b443eebb948
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 30 Jun 2024 10:01:07 +0200
improve parsing whitespace after end tag names
Simplified test-case:
https://git.codemadness.org/sfeed_tests/commit/e091160c3125322193bd8f27691c87eaa48cfc93.html
879fe9b0203550755f7b70d8c0061b443eebb9482023-08-15T17:12:42Z2023-08-15T17:15:12Zimprove wording and fix small typosHiltjo Posthumahiltjo@codemadness.orgcommit 879fe9b0203550755f7b70d8c0061b443eebb948
parent 7789da8556c97b8a8a4f9f8577b7a2e3f7693b31
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 15 Aug 2023 19:12:42 +0200
improve wording and fix small typos
7789da8556c97b8a8a4f9f8577b7a2e3f7693b312023-05-14T21:59:37Z2023-05-14T21:59:37Zxml.h: _XML_H_: macro name with an underscore is a reserved identifierHiltjo Posthumahiltjo@codemadness.orgcommit 7789da8556c97b8a8a4f9f8577b7a2e3f7693b31
parent 75b731325005b143fcc5f5945482b72eeadb4a19
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 14 May 2023 23:59:37 +0200
xml.h: _XML_H_: macro name with an underscore is a reserved identifier
Found with clang -Wreserved-macro-identifier
See also:
https://wiki.sei.cmu.edu/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier
75b731325005b143fcc5f5945482b72eeadb4a192023-04-28T10:26:10Z2023-04-28T10:26:10Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit 75b731325005b143fcc5f5945482b72eeadb4a19
parent dc227fd2751d09a76ac0931ce0543d90aa19fbff
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 28 Apr 2023 12:26:10 +0200
bump LICENSE year
dc227fd2751d09a76ac0931ce0543d90aa19fbff2022-08-26T19:57:33Z2022-08-26T19:57:33Zimprove comment: uppercase cdata -> CDATAHiltjo Posthumahiltjo@codemadness.orgcommit dc227fd2751d09a76ac0931ce0543d90aa19fbff
parent e14d8e93235e5ca0d3afa1c1c6f025e27f11459b
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 26 Aug 2022 21:57:33 +0200
improve comment: uppercase cdata -> CDATA
e14d8e93235e5ca0d3afa1c1c6f025e27f11459b2022-07-20T18:37:07Z2022-07-20T18:37:07Zpedantic typoHiltjo Posthumahiltjo@codemadness.orgcommit e14d8e93235e5ca0d3afa1c1c6f025e27f11459b
parent acb34b902157df0f369133c982dee819a3b232a1
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Wed, 20 Jul 2022 20:37:07 +0200
pedantic typo
acb34b902157df0f369133c982dee819a3b232a12022-06-21T09:05:12Z2022-06-21T09:05:12ZREADME: fix some typosHiltjo Posthumahiltjo@codemadness.orgcommit acb34b902157df0f369133c982dee819a3b232a1
parent 65afce3d7bd49760896232df25ad637494b9873b
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 21 Jun 2022 11:05:12 +0200
README: fix some typos
synced from sfeed
65afce3d7bd49760896232df25ad637494b9873b2022-03-29T08:57:38Z2022-03-29T08:57:38Zdo not depend on the C locale and ctype functionsHiltjo Posthumahiltjo@codemadness.orgcommit 65afce3d7bd49760896232df25ad637494b9873b
parent 2a32dccb8b6784d6d5821daaa42e5422208274c7
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 29 Mar 2022 10:57:38 +0200
do not depend on the C locale and ctype functions
These are not strictly defined to be ASCII compatible or have the same
assumptions of the XML specification.
2a32dccb8b6784d6d5821daaa42e5422208274c72022-03-29T08:57:12Z2022-03-29T08:57:12Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit 2a32dccb8b6784d6d5821daaa42e5422208274c7
parent 29205de3946a73fd85b834ccf5d311e27603cb95
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 29 Mar 2022 10:57:12 +0200
bump LICENSE year
29205de3946a73fd85b834ccf5d311e27603cb952021-07-01T21:21:05Z2021-07-01T21:21:05Zbump LICENSE year, remove newline before EOF in READMEHiltjo Posthumahiltjo@codemadness.orgcommit 29205de3946a73fd85b834ccf5d311e27603cb95
parent 9f5430accf9f8d29e4fa3ee627dca46983c9bb03
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 1 Jul 2021 23:21:05 +0200
bump LICENSE year, remove newline before EOF in README
9f5430accf9f8d29e4fa3ee627dca46983c9bb032021-01-28T18:41:34Z2021-01-28T18:41:34Zfix small typoHiltjo Posthumahiltjo@codemadness.orgcommit 9f5430accf9f8d29e4fa3ee627dca46983c9bb03
parent f84ddc4941d473eccd2509921dfb5055e108e96e
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 28 Jan 2021 19:41:34 +0100
fix small typo
f84ddc4941d473eccd2509921dfb5055e108e96e2021-01-22T21:51:21Z2021-01-22T21:51:21Zxml.c: fix typo in checking surrogate rangeHiltjo Posthumahiltjo@codemadness.orgcommit f84ddc4941d473eccd2509921dfb5055e108e96e
parent 2e33c882b88eebdaefb0477658a9cbb79d57e2b1
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 22 Jan 2021 22:51:21 +0100
xml.c: fix typo in checking surrogate range
2e33c882b88eebdaefb0477658a9cbb79d57e2b12021-01-22T12:37:47Z2021-01-22T12:37:47Zxml.c: do not convert UTF-16 surrogate pairs to an invalid sequenceHiltjo Posthumahiltjo@codemadness.orgcommit 2e33c882b88eebdaefb0477658a9cbb79d57e2b1
parent 6d001c968814d93492e5925f63ede6aa94c12552
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 22 Jan 2021 13:37:47 +0100
xml.c: do not convert UTF-16 surrogate pairs to an invalid sequence
In sfeed a simple way to reproduce:
printf '<item><title>�</title></item>' | sfeed | iconv -t utf-8
Result:
iconv: (stdin):1:8: cannot convert
Output result:
printf '<item><title>�</title></item>' | sfeed
Before:
00000000 09 ed b0 80 09 09 09 09 09 09 09 0a |............|
0000000c
After:
00000000 09 26 23 78 64 63 30 30 3b 09 09 09 09 09 09 09 |.�.......|
00000010 0a |.|
00000011
The entity is output as a literal string. This allows to see more easily whats
wrong and debug the feed and it is consistent with the current behaviour of
invalid named entities (&bla;). An alternative could be a UTF-8 replacement
symbol (codepoint 0xfffd).
Reference: https://unicode.org/faq/utf_bom.html , specificly:
"Q: How do I convert an unpaired UTF-16 surrogate to UTF-8? "
"A: A different issue arises if an unpaired surrogate is encountered when
converting ill-formed UTF-16 data. By representing such an unpaired surrogate
on its own as a 3-byte sequence, the resulting UTF-8 data stream would become
ill-formed. While it faithfully reflects the nature of the input, Unicode
conformance requires that encoding form conversion always results in a valid
data stream. Therefore a converter must treat this as an error. [AF]"
6d001c968814d93492e5925f63ede6aa94c125522020-10-16T09:23:59Z2020-10-16T09:23:59ZREADME: some small changesHiltjo Posthumahiltjo@codemadness.orgcommit 6d001c968814d93492e5925f63ede6aa94c12552
parent 69fd59a2bde2c2ddd9a478e83a90cfb969ad2369
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 16 Oct 2020 11:23:59 +0200
README: some small changes
69fd59a2bde2c2ddd9a478e83a90cfb969ad23692020-10-06T17:00:50Z2020-10-06T17:00:50Zxml.h: add underscore for #ifdef guardHiltjo Posthumahiltjo@codemadness.orgcommit 69fd59a2bde2c2ddd9a478e83a90cfb969ad2369
parent 26ae48529f916f9cd178114f543aa565b8dbb088
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 6 Oct 2020 19:00:50 +0200
xml.h: add underscore for #ifdef guard
This is the common style.
26ae48529f916f9cd178114f543aa565b8dbb0882020-06-01T10:16:33Z2020-06-01T10:16:33Zfix typoHiltjo Posthumahiltjo@codemadness.orgcommit 26ae48529f916f9cd178114f543aa565b8dbb088
parent f32a38c45da3bd764f1708600a33bd878cbe8afc
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 1 Jun 2020 12:16:33 +0200
fix typo
f32a38c45da3bd764f1708600a33bd878cbe8afc2020-02-04T21:50:16Z2020-02-04T21:50:16Zskeleton: add stub for translating entities (#if 0'd)Hiltjo Posthumahiltjo@codemadness.orgcommit f32a38c45da3bd764f1708600a33bd878cbe8afc
parent b276cff61206ddd969e3963edd886e8ee0a5ea8f
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 4 Feb 2020 22:50:16 +0100
skeleton: add stub for translating entities (#if 0'd)
b276cff61206ddd969e3963edd886e8ee0a5ea8f2020-01-28T20:56:07Z2020-01-28T20:56:07Zcleanup some includesHiltjo Posthumahiltjo@codemadness.orgcommit b276cff61206ddd969e3963edd886e8ee0a5ea8f
parent c37fc4290e718628f2aeeffcae135861948ff831
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 28 Jan 2020 21:56:07 +0100
cleanup some includes
c37fc4290e718628f2aeeffcae135861948ff8312020-01-19T15:18:42Z2020-01-19T15:18:42Zimprove XML entity conversionHiltjo Posthumahiltjo@codemadness.orgcommit c37fc4290e718628f2aeeffcae135861948ff831
parent d397ee643b3cfbb99aa18fc345407a1182a62cad
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 19 Jan 2020 16:18:42 +0100
improve XML entity conversion
- return -1 for invalid XML entities.
- separate between NUL (�) and invalid entities: although both are
usually unwanted.
- validate the number range more strictly and don't wrap to unsigned.
entities like: "&#-1;" are handled as invalid now. "&#;" is also invalid
instead of the same as "�".
d397ee643b3cfbb99aa18fc345407a1182a62cad2020-01-03T23:36:19Z2020-01-03T23:36:19Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit d397ee643b3cfbb99aa18fc345407a1182a62cad
parent 6204debcd10e48fd48f127f6eb514f7a9d7c50b8
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 4 Jan 2020 00:36:19 +0100
bump LICENSE year
6204debcd10e48fd48f127f6eb514f7a9d7c50b82020-01-03T16:06:34Z2020-01-03T16:06:34ZREADME: document white-space handlingHiltjo Posthumahiltjo@codemadness.orgcommit 6204debcd10e48fd48f127f6eb514f7a9d7c50b8
parent 16af4b632b88af06bc89e97941ae7eb9e4b8ae00
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 3 Jan 2020 17:06:34 +0100
README: document white-space handling
16af4b632b88af06bc89e97941ae7eb9e4b8ae002019-11-23T12:24:49Z2019-11-23T12:24:49Zxml.c: upper-case named-entities are invalid in XMLHiltjo Posthumahiltjo@codemadness.orgcommit 16af4b632b88af06bc89e97941ae7eb9e4b8ae00
parent 9b7534cd3d2be4964dfd6dbeca2111c07fb9762a
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 23 Nov 2019 13:24:49 +0100
xml.c: upper-case named-entities are invalid in XML
Named entities are case-sensitive and in XML lower-case.
(In HTML some of these are valid. Although &APOS; is invalid there too).
References:
4.6 Predefined entities: https://www.w3.org/TR/xml/#sec-predefined-ent
In the definition of "match": https://www.w3.org/TR/xml/#dt-match
"No case folding is performed."
9b7534cd3d2be4964dfd6dbeca2111c07fb9762a2019-09-12T17:51:28Z2019-09-12T17:51:28Zskeleton.c: remove #ifndef GETNEXT hereHiltjo Posthumahiltjo@codemadness.orgcommit 9b7534cd3d2be4964dfd6dbeca2111c07fb9762a
parent a89216192dbac51fee5945195c1e754b9232d5ae
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 12 Sep 2019 19:51:28 +0200
skeleton.c: remove #ifndef GETNEXT here
a89216192dbac51fee5945195c1e754b9232d5ae2019-08-23T10:13:39Z2019-08-23T10:13:39Zupdate README, bump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit a89216192dbac51fee5945195c1e754b9232d5ae
parent 41dd854ab87458efcc330f9f77470e4ffb727880
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 23 Aug 2019 12:13:39 +0200
update README, bump LICENSE year
41dd854ab87458efcc330f9f77470e4ffb7278802019-06-21T11:57:44Z2019-06-21T11:57:44ZREADME: add comment about parser limitation/restrictionHiltjo Posthumahiltjo@codemadness.orgcommit 41dd854ab87458efcc330f9f77470e4ffb727880
parent 908a3c3d0c612673b32c2714d9f46bc723c7a38b
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 21 Jun 2019 13:57:44 +0200
README: add comment about parser limitation/restriction
908a3c3d0c612673b32c2714d9f46bc723c7a38b2019-06-16T20:19:31Z2019-06-16T20:19:31Zsync XML improvements (from sfeed)Hiltjo Posthumahiltjo@codemadness.orgcommit 908a3c3d0c612673b32c2714d9f46bc723c7a38b
parent b2078dbb866bea46507ebb9d3d4c12c93c4f39f8
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 16 Jun 2019 22:19:31 +0200
sync XML improvements (from sfeed)
b2078dbb866bea46507ebb9d3d4c12c93c4f39f82018-12-12T18:14:12Z2018-12-12T18:14:12Zstyle fixHiltjo Posthumahiltjo@codemadness.orgcommit b2078dbb866bea46507ebb9d3d4c12c93c4f39f8
parent e9114f99e2b610c1d8899dcffca7edc28e09b614
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Wed, 12 Dec 2018 19:14:12 +0100
style fix
e9114f99e2b610c1d8899dcffca7edc28e09b6142018-12-07T18:50:18Z2018-12-07T18:50:18Zwhitespace fixesHiltjo Posthumahiltjo@codemadness.orgcommit e9114f99e2b610c1d8899dcffca7edc28e09b614
parent 287a59a2d0fc7c1f98d33e6142409c755fd39216
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 7 Dec 2018 19:50:18 +0100
whitespace fixes
287a59a2d0fc7c1f98d33e6142409c755fd392162018-11-02T16:48:53Z2018-11-02T16:48:53Zmake separate repo for shared XML code "library"Hiltjo Posthumahiltjo@codemadness.orgcommit 287a59a2d0fc7c1f98d33e6142409c755fd39216
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 2 Nov 2018 17:48:53 +0100
make separate repo for shared XML code "library"