xmlparser, branch HEAD XML parser 22dd4a4f48ad39da91869d56348538e063f3c790 2025-12-11T19:48:19Z 2025-12-11T19:48:19Z stricter numeric entity validation before parsing Hiltjo Posthuma hiltjo@codemadness.org commit 22dd4a4f48ad39da91869d56348538e063f3c790 parent 0acf4adc915dae0ca25cee374189a602f905cdce Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 11 Dec 2025 20:48:19 +0100 stricter numeric entity validation before parsing 0acf4adc915dae0ca25cee374189a602f905cdce 2025-09-21T12:25:52Z 2025-09-21T12:25:52Z update README text and bump LICENSE Hiltjo Posthuma hiltjo@codemadness.org commit 0acf4adc915dae0ca25cee374189a602f905cdce parent 317a612f61bb72c4b67a0cb8f923c2d37e7bdcec Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 21 Sep 2025 14:25:52 +0200 update README text and bump LICENSE 317a612f61bb72c4b67a0cb8f923c2d37e7bdcec 2025-09-21T12:24:55Z 2025-09-21T12:24:55Z slightly reduce stack size for entities Hiltjo Posthuma hiltjo@codemadness.org commit 317a612f61bb72c4b67a0cb8f923c2d37e7bdcec parent fc93b122349d4281c31fadf30360c2a5b6c235c9 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 21 Sep 2025 14:24:55 +0200 slightly reduce stack size for entities fc93b122349d4281c31fadf30360c2a5b6c235c9 2024-06-30T08:20:40Z 2024-06-30T08:20:40Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit fc93b122349d4281c31fadf30360c2a5b6c235c9 parent 5f875a2e8c6ea50252807924400c7d125505f206 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 30 Jun 2024 10:20:40 +0200 bump LICENSE year 5f875a2e8c6ea50252807924400c7d125505f206 2024-06-30T08:01:07Z 2024-06-30T08:01:07Z improve parsing whitespace after end tag names Hiltjo Posthuma hiltjo@codemadness.org commit 5f875a2e8c6ea50252807924400c7d125505f206 parent 879fe9b0203550755f7b70d8c0061b443eebb948 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 30 Jun 2024 10:01:07 +0200 improve parsing whitespace after end tag names Simplified test-case: https://git.codemadness.org/sfeed_tests/commit/e091160c3125322193bd8f27691c87eaa48cfc93.html 879fe9b0203550755f7b70d8c0061b443eebb948 2023-08-15T17:12:42Z 2023-08-15T17:15:12Z improve wording and fix small typos Hiltjo Posthuma hiltjo@codemadness.org commit 879fe9b0203550755f7b70d8c0061b443eebb948 parent 7789da8556c97b8a8a4f9f8577b7a2e3f7693b31 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 15 Aug 2023 19:12:42 +0200 improve wording and fix small typos 7789da8556c97b8a8a4f9f8577b7a2e3f7693b31 2023-05-14T21:59:37Z 2023-05-14T21:59:37Z xml.h: _XML_H_: macro name with an underscore is a reserved identifier Hiltjo Posthuma hiltjo@codemadness.org commit 7789da8556c97b8a8a4f9f8577b7a2e3f7693b31 parent 75b731325005b143fcc5f5945482b72eeadb4a19 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 14 May 2023 23:59:37 +0200 xml.h: _XML_H_: macro name with an underscore is a reserved identifier Found with clang -Wreserved-macro-identifier See also: https://wiki.sei.cmu.edu/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier 75b731325005b143fcc5f5945482b72eeadb4a19 2023-04-28T10:26:10Z 2023-04-28T10:26:10Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit 75b731325005b143fcc5f5945482b72eeadb4a19 parent dc227fd2751d09a76ac0931ce0543d90aa19fbff Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 28 Apr 2023 12:26:10 +0200 bump LICENSE year dc227fd2751d09a76ac0931ce0543d90aa19fbff 2022-08-26T19:57:33Z 2022-08-26T19:57:33Z improve comment: uppercase cdata -> CDATA Hiltjo Posthuma hiltjo@codemadness.org commit dc227fd2751d09a76ac0931ce0543d90aa19fbff parent e14d8e93235e5ca0d3afa1c1c6f025e27f11459b Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 26 Aug 2022 21:57:33 +0200 improve comment: uppercase cdata -> CDATA e14d8e93235e5ca0d3afa1c1c6f025e27f11459b 2022-07-20T18:37:07Z 2022-07-20T18:37:07Z pedantic typo Hiltjo Posthuma hiltjo@codemadness.org commit e14d8e93235e5ca0d3afa1c1c6f025e27f11459b parent acb34b902157df0f369133c982dee819a3b232a1 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Wed, 20 Jul 2022 20:37:07 +0200 pedantic typo acb34b902157df0f369133c982dee819a3b232a1 2022-06-21T09:05:12Z 2022-06-21T09:05:12Z README: fix some typos Hiltjo Posthuma hiltjo@codemadness.org commit acb34b902157df0f369133c982dee819a3b232a1 parent 65afce3d7bd49760896232df25ad637494b9873b Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 21 Jun 2022 11:05:12 +0200 README: fix some typos synced from sfeed 65afce3d7bd49760896232df25ad637494b9873b 2022-03-29T08:57:38Z 2022-03-29T08:57:38Z do not depend on the C locale and ctype functions Hiltjo Posthuma hiltjo@codemadness.org commit 65afce3d7bd49760896232df25ad637494b9873b parent 2a32dccb8b6784d6d5821daaa42e5422208274c7 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 29 Mar 2022 10:57:38 +0200 do not depend on the C locale and ctype functions These are not strictly defined to be ASCII compatible or have the same assumptions of the XML specification. 2a32dccb8b6784d6d5821daaa42e5422208274c7 2022-03-29T08:57:12Z 2022-03-29T08:57:12Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit 2a32dccb8b6784d6d5821daaa42e5422208274c7 parent 29205de3946a73fd85b834ccf5d311e27603cb95 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 29 Mar 2022 10:57:12 +0200 bump LICENSE year 29205de3946a73fd85b834ccf5d311e27603cb95 2021-07-01T21:21:05Z 2021-07-01T21:21:05Z bump LICENSE year, remove newline before EOF in README Hiltjo Posthuma hiltjo@codemadness.org commit 29205de3946a73fd85b834ccf5d311e27603cb95 parent 9f5430accf9f8d29e4fa3ee627dca46983c9bb03 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 1 Jul 2021 23:21:05 +0200 bump LICENSE year, remove newline before EOF in README 9f5430accf9f8d29e4fa3ee627dca46983c9bb03 2021-01-28T18:41:34Z 2021-01-28T18:41:34Z fix small typo Hiltjo Posthuma hiltjo@codemadness.org commit 9f5430accf9f8d29e4fa3ee627dca46983c9bb03 parent f84ddc4941d473eccd2509921dfb5055e108e96e Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 28 Jan 2021 19:41:34 +0100 fix small typo f84ddc4941d473eccd2509921dfb5055e108e96e 2021-01-22T21:51:21Z 2021-01-22T21:51:21Z xml.c: fix typo in checking surrogate range Hiltjo Posthuma hiltjo@codemadness.org commit f84ddc4941d473eccd2509921dfb5055e108e96e parent 2e33c882b88eebdaefb0477658a9cbb79d57e2b1 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 22 Jan 2021 22:51:21 +0100 xml.c: fix typo in checking surrogate range 2e33c882b88eebdaefb0477658a9cbb79d57e2b1 2021-01-22T12:37:47Z 2021-01-22T12:37:47Z xml.c: do not convert UTF-16 surrogate pairs to an invalid sequence Hiltjo Posthuma hiltjo@codemadness.org commit 2e33c882b88eebdaefb0477658a9cbb79d57e2b1 parent 6d001c968814d93492e5925f63ede6aa94c12552 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 22 Jan 2021 13:37:47 +0100 xml.c: do not convert UTF-16 surrogate pairs to an invalid sequence In sfeed a simple way to reproduce: printf '<item><title>&#xdc00;</title></item>' | sfeed | iconv -t utf-8 Result: iconv: (stdin):1:8: cannot convert Output result: printf '<item><title>&#xdc00;</title></item>' | sfeed Before: 00000000 09 ed b0 80 09 09 09 09 09 09 09 0a |............| 0000000c After: 00000000 09 26 23 78 64 63 30 30 3b 09 09 09 09 09 09 09 |.&#xdc00;.......| 00000010 0a |.| 00000011 The entity is output as a literal string. This allows to see more easily whats wrong and debug the feed and it is consistent with the current behaviour of invalid named entities (&bla;). An alternative could be a UTF-8 replacement symbol (codepoint 0xfffd). Reference: https://unicode.org/faq/utf_bom.html , specificly: "Q: How do I convert an unpaired UTF-16 surrogate to UTF-8? " "A: A different issue arises if an unpaired surrogate is encountered when converting ill-formed UTF-16 data. By representing such an unpaired surrogate on its own as a 3-byte sequence, the resulting UTF-8 data stream would become ill-formed. While it faithfully reflects the nature of the input, Unicode conformance requires that encoding form conversion always results in a valid data stream. Therefore a converter must treat this as an error. [AF]" 6d001c968814d93492e5925f63ede6aa94c12552 2020-10-16T09:23:59Z 2020-10-16T09:23:59Z README: some small changes Hiltjo Posthuma hiltjo@codemadness.org commit 6d001c968814d93492e5925f63ede6aa94c12552 parent 69fd59a2bde2c2ddd9a478e83a90cfb969ad2369 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 16 Oct 2020 11:23:59 +0200 README: some small changes 69fd59a2bde2c2ddd9a478e83a90cfb969ad2369 2020-10-06T17:00:50Z 2020-10-06T17:00:50Z xml.h: add underscore for #ifdef guard Hiltjo Posthuma hiltjo@codemadness.org commit 69fd59a2bde2c2ddd9a478e83a90cfb969ad2369 parent 26ae48529f916f9cd178114f543aa565b8dbb088 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 6 Oct 2020 19:00:50 +0200 xml.h: add underscore for #ifdef guard This is the common style. 26ae48529f916f9cd178114f543aa565b8dbb088 2020-06-01T10:16:33Z 2020-06-01T10:16:33Z fix typo Hiltjo Posthuma hiltjo@codemadness.org commit 26ae48529f916f9cd178114f543aa565b8dbb088 parent f32a38c45da3bd764f1708600a33bd878cbe8afc Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 1 Jun 2020 12:16:33 +0200 fix typo f32a38c45da3bd764f1708600a33bd878cbe8afc 2020-02-04T21:50:16Z 2020-02-04T21:50:16Z skeleton: add stub for translating entities (#if 0'd) Hiltjo Posthuma hiltjo@codemadness.org commit f32a38c45da3bd764f1708600a33bd878cbe8afc parent b276cff61206ddd969e3963edd886e8ee0a5ea8f Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 4 Feb 2020 22:50:16 +0100 skeleton: add stub for translating entities (#if 0'd) b276cff61206ddd969e3963edd886e8ee0a5ea8f 2020-01-28T20:56:07Z 2020-01-28T20:56:07Z cleanup some includes Hiltjo Posthuma hiltjo@codemadness.org commit b276cff61206ddd969e3963edd886e8ee0a5ea8f parent c37fc4290e718628f2aeeffcae135861948ff831 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 28 Jan 2020 21:56:07 +0100 cleanup some includes c37fc4290e718628f2aeeffcae135861948ff831 2020-01-19T15:18:42Z 2020-01-19T15:18:42Z improve XML entity conversion Hiltjo Posthuma hiltjo@codemadness.org commit c37fc4290e718628f2aeeffcae135861948ff831 parent d397ee643b3cfbb99aa18fc345407a1182a62cad Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 19 Jan 2020 16:18:42 +0100 improve XML entity conversion - return -1 for invalid XML entities. - separate between NUL (&#0;) and invalid entities: although both are usually unwanted. - validate the number range more strictly and don't wrap to unsigned. entities like: "&#-1;" are handled as invalid now. "&#;" is also invalid instead of the same as "&#0;". d397ee643b3cfbb99aa18fc345407a1182a62cad 2020-01-03T23:36:19Z 2020-01-03T23:36:19Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit d397ee643b3cfbb99aa18fc345407a1182a62cad parent 6204debcd10e48fd48f127f6eb514f7a9d7c50b8 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 4 Jan 2020 00:36:19 +0100 bump LICENSE year 6204debcd10e48fd48f127f6eb514f7a9d7c50b8 2020-01-03T16:06:34Z 2020-01-03T16:06:34Z README: document white-space handling Hiltjo Posthuma hiltjo@codemadness.org commit 6204debcd10e48fd48f127f6eb514f7a9d7c50b8 parent 16af4b632b88af06bc89e97941ae7eb9e4b8ae00 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 3 Jan 2020 17:06:34 +0100 README: document white-space handling 16af4b632b88af06bc89e97941ae7eb9e4b8ae00 2019-11-23T12:24:49Z 2019-11-23T12:24:49Z xml.c: upper-case named-entities are invalid in XML Hiltjo Posthuma hiltjo@codemadness.org commit 16af4b632b88af06bc89e97941ae7eb9e4b8ae00 parent 9b7534cd3d2be4964dfd6dbeca2111c07fb9762a Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 23 Nov 2019 13:24:49 +0100 xml.c: upper-case named-entities are invalid in XML Named entities are case-sensitive and in XML lower-case. (In HTML some of these are valid. Although &APOS; is invalid there too). References: 4.6 Predefined entities: https://www.w3.org/TR/xml/#sec-predefined-ent In the definition of "match": https://www.w3.org/TR/xml/#dt-match "No case folding is performed." 9b7534cd3d2be4964dfd6dbeca2111c07fb9762a 2019-09-12T17:51:28Z 2019-09-12T17:51:28Z skeleton.c: remove #ifndef GETNEXT here Hiltjo Posthuma hiltjo@codemadness.org commit 9b7534cd3d2be4964dfd6dbeca2111c07fb9762a parent a89216192dbac51fee5945195c1e754b9232d5ae Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 12 Sep 2019 19:51:28 +0200 skeleton.c: remove #ifndef GETNEXT here a89216192dbac51fee5945195c1e754b9232d5ae 2019-08-23T10:13:39Z 2019-08-23T10:13:39Z update README, bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit a89216192dbac51fee5945195c1e754b9232d5ae parent 41dd854ab87458efcc330f9f77470e4ffb727880 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 23 Aug 2019 12:13:39 +0200 update README, bump LICENSE year 41dd854ab87458efcc330f9f77470e4ffb727880 2019-06-21T11:57:44Z 2019-06-21T11:57:44Z README: add comment about parser limitation/restriction Hiltjo Posthuma hiltjo@codemadness.org commit 41dd854ab87458efcc330f9f77470e4ffb727880 parent 908a3c3d0c612673b32c2714d9f46bc723c7a38b Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 21 Jun 2019 13:57:44 +0200 README: add comment about parser limitation/restriction 908a3c3d0c612673b32c2714d9f46bc723c7a38b 2019-06-16T20:19:31Z 2019-06-16T20:19:31Z sync XML improvements (from sfeed) Hiltjo Posthuma hiltjo@codemadness.org commit 908a3c3d0c612673b32c2714d9f46bc723c7a38b parent b2078dbb866bea46507ebb9d3d4c12c93c4f39f8 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 16 Jun 2019 22:19:31 +0200 sync XML improvements (from sfeed) b2078dbb866bea46507ebb9d3d4c12c93c4f39f8 2018-12-12T18:14:12Z 2018-12-12T18:14:12Z style fix Hiltjo Posthuma hiltjo@codemadness.org commit b2078dbb866bea46507ebb9d3d4c12c93c4f39f8 parent e9114f99e2b610c1d8899dcffca7edc28e09b614 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Wed, 12 Dec 2018 19:14:12 +0100 style fix e9114f99e2b610c1d8899dcffca7edc28e09b614 2018-12-07T18:50:18Z 2018-12-07T18:50:18Z whitespace fixes Hiltjo Posthuma hiltjo@codemadness.org commit e9114f99e2b610c1d8899dcffca7edc28e09b614 parent 287a59a2d0fc7c1f98d33e6142409c755fd39216 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 7 Dec 2018 19:50:18 +0100 whitespace fixes 287a59a2d0fc7c1f98d33e6142409c755fd39216 2018-11-02T16:48:53Z 2018-11-02T16:48:53Z make separate repo for shared XML code "library" Hiltjo Posthuma hiltjo@codemadness.org commit 287a59a2d0fc7c1f98d33e6142409c755fd39216 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 2 Nov 2018 17:48:53 +0100 make separate repo for shared XML code "library"