grabtitle, branch HEAD stupid HTML title grabber 7d8e5a616ebc4ab9b97bb5372ae5f68ae261effd 2024-06-30T08:20:54Z 2024-06-30T08:20:54Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit 7d8e5a616ebc4ab9b97bb5372ae5f68ae261effd parent e0263471557e79c3a178e6da41b5f7e2f4234625 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 30 Jun 2024 10:20:54 +0200 bump LICENSE year e0263471557e79c3a178e6da41b5f7e2f4234625 2024-06-30T08:19:01Z 2024-06-30T08:19:01Z xml.c: sync some of the improvements to this modified version Hiltjo Posthuma hiltjo@codemadness.org commit e0263471557e79c3a178e6da41b5f7e2f4234625 parent 29e4807c53d136de19f775b7d08b3c1c3a14d76d Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 30 Jun 2024 10:19:01 +0200 xml.c: sync some of the improvements to this modified version 29e4807c53d136de19f775b7d08b3c1c3a14d76d 2024-06-30T08:10:48Z 2024-06-30T08:10:48Z sync from xml.c: improve parsing whitespace after end tag names Hiltjo Posthuma hiltjo@codemadness.org commit 29e4807c53d136de19f775b7d08b3c1c3a14d76d parent 504468dfde3fd13d0b695f54ba87a8a913d0e9fb Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 30 Jun 2024 10:10:48 +0200 sync from xml.c: improve parsing whitespace after end tag names 504468dfde3fd13d0b695f54ba87a8a913d0e9fb 2021-04-22T18:20:07Z 2021-04-22T18:20:07Z xml.h: add underscore for #ifdef guard Hiltjo Posthuma hiltjo@codemadness.org commit 504468dfde3fd13d0b695f54ba87a8a913d0e9fb parent efe5e8763fcc364f504198009d79f841c48bf7dc Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 22 Apr 2021 20:20:07 +0200 xml.h: add underscore for #ifdef guard This is the common style. efe5e8763fcc364f504198009d79f841c48bf7dc 2021-04-22T18:19:06Z 2021-04-22T18:19:06Z do not convert UTF-16 surrogate pairs to an invalid sequence Hiltjo Posthuma hiltjo@codemadness.org commit efe5e8763fcc364f504198009d79f841c48bf7dc parent 375166031e3942890db414e46937ae485986a2fa Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 22 Apr 2021 20:19:06 +0200 do not convert UTF-16 surrogate pairs to an invalid sequence 375166031e3942890db414e46937ae485986a2fa 2021-04-22T18:18:59Z 2021-04-22T18:18:59Z bump LICENSE year Hiltjo Posthuma hiltjo@codemadness.org commit 375166031e3942890db414e46937ae485986a2fa parent 4ec9f8ab3e5138d2bb57c973e843a14f511f2819 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Thu, 22 Apr 2021 20:18:59 +0200 bump LICENSE year 4ec9f8ab3e5138d2bb57c973e843a14f511f2819 2020-06-05T19:53:08Z 2020-06-05T19:53:08Z revert commit fix bug in ignoring character in <script> / <style> Hiltjo Posthuma hiltjo@codemadness.org commit 4ec9f8ab3e5138d2bb57c973e843a14f511f2819 parent c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb52 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Fri, 5 Jun 2020 21:53:08 +0200 revert commit fix bug in ignoring character in <script> / <style> c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb52 2020-05-30T11:44:09Z 2020-05-30T11:44:09Z cleanup header includes Hiltjo Posthuma hiltjo@codemadness.org commit c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb52 parent 38f1c2aa05094b98f4c5ea8bec8161f2b663684d Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 30 May 2020 13:44:09 +0200 cleanup header includes 38f1c2aa05094b98f4c5ea8bec8161f2b663684d 2020-05-30T11:39:00Z 2020-05-30T11:40:14Z add subset of named entities (sync from webdump) Hiltjo Posthuma hiltjo@codemadness.org commit 38f1c2aa05094b98f4c5ea8bec8161f2b663684d parent 8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 30 May 2020 13:39:00 +0200 add subset of named entities (sync from webdump) 8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c 2020-05-30T11:36:43Z 2020-05-30T11:40:10Z sync xml.{c,h} Hiltjo Posthuma hiltjo@codemadness.org commit 8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c parent 0ffe161701f6f9ecde66204f5784e6709d647a1e Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 30 May 2020 13:36:43 +0200 sync xml.{c,h} 0ffe161701f6f9ecde66204f5784e6709d647a1e 2020-05-30T11:33:08Z 2020-05-30T11:33:08Z Makefile: respect ${CC} Hiltjo Posthuma hiltjo@codemadness.org commit 0ffe161701f6f9ecde66204f5784e6709d647a1e parent 16cc59c155068e6de1fd5cfa8720d6d765db6548 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 30 May 2020 13:33:08 +0200 Makefile: respect ${CC} 16cc59c155068e6de1fd5cfa8720d6d765db6548 2019-09-22T17:50:21Z 2019-09-22T17:50:21Z remove unneeded code, handle &nbsp; Hiltjo Posthuma hiltjo@codemadness.org commit 16cc59c155068e6de1fd5cfa8720d6d765db6548 parent 16dfed456fd96d1c483eb515594019d7a5febc86 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 22 Sep 2019 19:50:21 +0200 remove unneeded code, handle &nbsp; 16dfed456fd96d1c483eb515594019d7a5febc86 2019-09-22T17:49:58Z 2019-09-22T17:49:58Z fix bug in ignoring character in <script> / <style> Hiltjo Posthuma hiltjo@codemadness.org commit 16dfed456fd96d1c483eb515594019d7a5febc86 parent db328d0b3f3bb6988660f62428aed112618ca340 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 22 Sep 2019 19:49:58 +0200 fix bug in ignoring character in <script> / <style> db328d0b3f3bb6988660f62428aed112618ca340 2018-12-18T17:11:29Z 2018-12-18T17:11:29Z rename getchar_ignore to getnext_ignore Hiltjo Posthuma hiltjo@codemadness.org commit db328d0b3f3bb6988660f62428aed112618ca340 parent dcb355c8766619ad66061167c55b9f44c6ab7569 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 18 Dec 2018 18:11:29 +0100 rename getchar_ignore to getnext_ignore dcb355c8766619ad66061167c55b9f44c6ab7569 2018-12-12T18:04:56Z 2018-12-12T18:04:56Z check the returned length of xml_entitytostr() properly Hiltjo Posthuma hiltjo@codemadness.org commit dcb355c8766619ad66061167c55b9f44c6ab7569 parent 1c506adc3502530014355ae774f8b306e9a8f4bb Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Wed, 12 Dec 2018 19:04:56 +0100 check the returned length of xml_entitytostr() properly ... this shouldn't matter though since the buffer is always big enough here. 1c506adc3502530014355ae774f8b306e9a8f4bb 2018-12-11T20:28:10Z 2018-12-11T20:28:10Z main: no command-line arguments, add comment about the web Hiltjo Posthuma hiltjo@codemadness.org commit 1c506adc3502530014355ae774f8b306e9a8f4bb parent df35821b012c868c75ca1bed237624c39a0d7e12 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 11 Dec 2018 21:28:10 +0100 main: no command-line arguments, add comment about the web df35821b012c868c75ca1bed237624c39a0d7e12 2018-12-10T18:06:23Z 2018-12-10T18:06:23Z add README and LICENSE Hiltjo Posthuma hiltjo@codemadness.org commit df35821b012c868c75ca1bed237624c39a0d7e12 parent b148be9e590cdf8908414994d07742b1d7f72d8a Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 10 Dec 2018 19:06:23 +0100 add README and LICENSE b148be9e590cdf8908414994d07742b1d7f72d8a 2018-12-10T18:05:08Z 2018-12-10T18:05:08Z enable pledge on OpenBSD now Hiltjo Posthuma hiltjo@codemadness.org commit b148be9e590cdf8908414994d07742b1d7f72d8a parent 732acec692f6038dc15bbced3cba9a65417ba13f Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 10 Dec 2018 19:05:08 +0100 enable pledge on OpenBSD now 732acec692f6038dc15bbced3cba9a65417ba13f 2018-12-10T18:03:41Z 2018-12-10T18:03:41Z Makefile: respect system CFLAGS, LDFLAGS Hiltjo Posthuma hiltjo@codemadness.org commit 732acec692f6038dc15bbced3cba9a65417ba13f parent 3054084945aae1ceb22b87c9132dea71cf9e5108 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 10 Dec 2018 19:03:41 +0100 Makefile: respect system CFLAGS, LDFLAGS 3054084945aae1ceb22b87c9132dea71cf9e5108 2018-12-10T18:03:02Z 2018-12-10T18:03:02Z XML tag parse improvements for PI and end tags Hiltjo Posthuma hiltjo@codemadness.org commit 3054084945aae1ceb22b87c9132dea71cf9e5108 parent d908478d0f84bc275428fd71e934c993bb29211c Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 10 Dec 2018 19:03:02 +0100 XML tag parse improvements for PI and end tags - Stricter parsing of tags, no whitespace stripping after <. - For end tags the "internal" context x->tag would be "/sometag". Make sure this matches exactly with the parameter tag. - Reset tagname after parsing an end tag. - Make end tag handling more consistent. - Remove temporary variable taglen. d908478d0f84bc275428fd71e934c993bb29211c 2018-12-10T18:01:58Z 2018-12-10T18:01:58Z ignore incorrect unescaped HTML in <style> or <script> in a better way Hiltjo Posthuma hiltjo@codemadness.org commit d908478d0f84bc275428fd71e934c993bb29211c parent 0cca681092b680c5b80da62771d47fa383be6cd1 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon, 10 Dec 2018 19:01:58 +0100 ignore incorrect unescaped HTML in <style> or <script> in a better way this way we can still use a (mostly) XML parser for HTML data. 0cca681092b680c5b80da62771d47fa383be6cd1 2018-12-09T10:33:20Z 2018-12-09T10:33:20Z replace control characters (including newline and tab), simplify dataentity handler Hiltjo Posthuma hiltjo@codemadness.org commit 0cca681092b680c5b80da62771d47fa383be6cd1 parent 074560f704de31a111ed6c73edbb1f9c84413aa7 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 9 Dec 2018 11:33:20 +0100 replace control characters (including newline and tab), simplify dataentity handler 074560f704de31a111ed6c73edbb1f9c84413aa7 2018-11-13T17:06:51Z 2018-11-13T17:06:51Z ignore <title> tags in <style> or <script> Hiltjo Posthuma hiltjo@codemadness.org commit 074560f704de31a111ed6c73edbb1f9c84413aa7 parent 3fae05df6ed35c258c73314f9daa07b92314e03e Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 13 Nov 2018 18:06:51 +0100 ignore <title> tags in <style> or <script> seen in the wild web, this prevents an base64-encoded SVG XML inside a <style> tag with a <title> element from being interpreted as a page title. 3fae05df6ed35c258c73314f9daa07b92314e03e 2018-08-26T13:27:55Z 2018-08-26T13:27:55Z xml: sync many XML parser improvements Hiltjo Posthuma hiltjo@codemadness.org commit 3fae05df6ed35c258c73314f9daa07b92314e03e parent 0af2d13062af1f2bb254de507233ed28e8f8c459 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun, 26 Aug 2018 15:27:55 +0200 xml: sync many XML parser improvements 0af2d13062af1f2bb254de507233ed28e8f8c459 2018-03-31T14:35:56Z 2018-03-31T14:36:27Z support infinite length titles, no buffering Hiltjo Posthuma hiltjo@codemadness.org commit 0af2d13062af1f2bb254de507233ed28e8f8c459 parent 239ce1bd6d3855175866412b2d9b8c64ddf80930 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 16:35:56 +0200 support infinite length titles, no buffering 239ce1bd6d3855175866412b2d9b8c64ddf80930 2018-03-31T14:32:39Z 2018-03-31T14:36:23Z rename title to grabtitle Hiltjo Posthuma hiltjo@codemadness.org commit 239ce1bd6d3855175866412b2d9b8c64ddf80930 parent 202a253cfb5b88919c479d8abb177de9b4ef9925 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 16:32:39 +0200 rename title to grabtitle 202a253cfb5b88919c479d8abb177de9b4ef9925 2018-03-31T14:31:37Z 2018-03-31T14:31:37Z rename run.sh to example, use required argument Hiltjo Posthuma hiltjo@codemadness.org commit 202a253cfb5b88919c479d8abb177de9b4ef9925 parent 20cf1608ad4cae4c89101350da8d11c9f23512b1 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 16:31:37 +0200 rename run.sh to example, use required argument 20cf1608ad4cae4c89101350da8d11c9f23512b1 2018-03-31T11:43:34Z 2018-03-31T11:43:34Z use string length from xml_entitytostr, save a few lines Hiltjo Posthuma hiltjo@codemadness.org commit 20cf1608ad4cae4c89101350da8d11c9f23512b1 parent fee48ebf0343f68a35c2b65a0f2d82e8ac803725 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 13:43:34 +0200 use string length from xml_entitytostr, save a few lines fee48ebf0343f68a35c2b65a0f2d82e8ac803725 2018-03-31T11:34:12Z 2018-03-31T11:34:12Z hide error output Hiltjo Posthuma hiltjo@codemadness.org commit fee48ebf0343f68a35c2b65a0f2d82e8ac803725 parent 7d6279c8dec086f01bd2355d15292afa630238a4 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 13:34:12 +0200 hide error output possibly "failed writing body", this is because title.c does not read the whole output after reading the title tag. 7d6279c8dec086f01bd2355d15292afa630238a4 2018-03-31T11:01:57Z 2018-03-31T11:01:57Z fix: don't use entity data for all tags Hiltjo Posthuma hiltjo@codemadness.org commit 7d6279c8dec086f01bd2355d15292afa630238a4 parent 5c21827b86be877d3d5df7f7a9b810822e4f8e22 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 13:01:57 +0200 fix: don't use entity data for all tags 5c21827b86be877d3d5df7f7a9b810822e4f8e22 2018-03-31T10:59:22Z 2018-03-31T10:59:22Z initial insertion Hiltjo Posthuma hiltjo@codemadness.org commit 5c21827b86be877d3d5df7f7a9b810822e4f8e22 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sat, 31 Mar 2018 12:59:22 +0200 initial insertion