grabtitle, branch HEADstupid HTML title grabber
7d8e5a616ebc4ab9b97bb5372ae5f68ae261effd2024-06-30T08:20:54Z2024-06-30T08:20:54Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit 7d8e5a616ebc4ab9b97bb5372ae5f68ae261effd
parent e0263471557e79c3a178e6da41b5f7e2f4234625
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 30 Jun 2024 10:20:54 +0200
bump LICENSE year
e0263471557e79c3a178e6da41b5f7e2f42346252024-06-30T08:19:01Z2024-06-30T08:19:01Zxml.c: sync some of the improvements to this modified versionHiltjo Posthumahiltjo@codemadness.orgcommit e0263471557e79c3a178e6da41b5f7e2f4234625
parent 29e4807c53d136de19f775b7d08b3c1c3a14d76d
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 30 Jun 2024 10:19:01 +0200
xml.c: sync some of the improvements to this modified version
29e4807c53d136de19f775b7d08b3c1c3a14d76d2024-06-30T08:10:48Z2024-06-30T08:10:48Zsync from xml.c: improve parsing whitespace after end tag namesHiltjo Posthumahiltjo@codemadness.orgcommit 29e4807c53d136de19f775b7d08b3c1c3a14d76d
parent 504468dfde3fd13d0b695f54ba87a8a913d0e9fb
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 30 Jun 2024 10:10:48 +0200
sync from xml.c: improve parsing whitespace after end tag names
504468dfde3fd13d0b695f54ba87a8a913d0e9fb2021-04-22T18:20:07Z2021-04-22T18:20:07Zxml.h: add underscore for #ifdef guardHiltjo Posthumahiltjo@codemadness.orgcommit 504468dfde3fd13d0b695f54ba87a8a913d0e9fb
parent efe5e8763fcc364f504198009d79f841c48bf7dc
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 22 Apr 2021 20:20:07 +0200
xml.h: add underscore for #ifdef guard
This is the common style.
efe5e8763fcc364f504198009d79f841c48bf7dc2021-04-22T18:19:06Z2021-04-22T18:19:06Zdo not convert UTF-16 surrogate pairs to an invalid sequenceHiltjo Posthumahiltjo@codemadness.orgcommit efe5e8763fcc364f504198009d79f841c48bf7dc
parent 375166031e3942890db414e46937ae485986a2fa
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 22 Apr 2021 20:19:06 +0200
do not convert UTF-16 surrogate pairs to an invalid sequence
375166031e3942890db414e46937ae485986a2fa2021-04-22T18:18:59Z2021-04-22T18:18:59Zbump LICENSE yearHiltjo Posthumahiltjo@codemadness.orgcommit 375166031e3942890db414e46937ae485986a2fa
parent 4ec9f8ab3e5138d2bb57c973e843a14f511f2819
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Thu, 22 Apr 2021 20:18:59 +0200
bump LICENSE year
4ec9f8ab3e5138d2bb57c973e843a14f511f28192020-06-05T19:53:08Z2020-06-05T19:53:08Zrevert commit fix bug in ignoring character in <script> / <style>Hiltjo Posthumahiltjo@codemadness.orgcommit 4ec9f8ab3e5138d2bb57c973e843a14f511f2819
parent c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb52
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 5 Jun 2020 21:53:08 +0200
revert commit fix bug in ignoring character in <script> / <style>
c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb522020-05-30T11:44:09Z2020-05-30T11:44:09Zcleanup header includesHiltjo Posthumahiltjo@codemadness.orgcommit c17b7c1af8b4c7ef64267cd2ab1c5455ba80fb52
parent 38f1c2aa05094b98f4c5ea8bec8161f2b663684d
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 30 May 2020 13:44:09 +0200
cleanup header includes
38f1c2aa05094b98f4c5ea8bec8161f2b663684d2020-05-30T11:39:00Z2020-05-30T11:40:14Zadd subset of named entities (sync from webdump)Hiltjo Posthumahiltjo@codemadness.orgcommit 38f1c2aa05094b98f4c5ea8bec8161f2b663684d
parent 8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 30 May 2020 13:39:00 +0200
add subset of named entities (sync from webdump)
8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c2020-05-30T11:36:43Z2020-05-30T11:40:10Zsync xml.{c,h}Hiltjo Posthumahiltjo@codemadness.orgcommit 8e2bee7e85c6a6fbdb2b9ef84c69f8f74ab5b77c
parent 0ffe161701f6f9ecde66204f5784e6709d647a1e
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 30 May 2020 13:36:43 +0200
sync xml.{c,h}
0ffe161701f6f9ecde66204f5784e6709d647a1e2020-05-30T11:33:08Z2020-05-30T11:33:08ZMakefile: respect ${CC}Hiltjo Posthumahiltjo@codemadness.orgcommit 0ffe161701f6f9ecde66204f5784e6709d647a1e
parent 16cc59c155068e6de1fd5cfa8720d6d765db6548
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 30 May 2020 13:33:08 +0200
Makefile: respect ${CC}
16cc59c155068e6de1fd5cfa8720d6d765db65482019-09-22T17:50:21Z2019-09-22T17:50:21Zremove unneeded code, handle Hiltjo Posthumahiltjo@codemadness.orgcommit 16cc59c155068e6de1fd5cfa8720d6d765db6548
parent 16dfed456fd96d1c483eb515594019d7a5febc86
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 22 Sep 2019 19:50:21 +0200
remove unneeded code, handle
16dfed456fd96d1c483eb515594019d7a5febc862019-09-22T17:49:58Z2019-09-22T17:49:58Zfix bug in ignoring character in <script> / <style>Hiltjo Posthumahiltjo@codemadness.orgcommit 16dfed456fd96d1c483eb515594019d7a5febc86
parent db328d0b3f3bb6988660f62428aed112618ca340
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 22 Sep 2019 19:49:58 +0200
fix bug in ignoring character in <script> / <style>
db328d0b3f3bb6988660f62428aed112618ca3402018-12-18T17:11:29Z2018-12-18T17:11:29Zrename getchar_ignore to getnext_ignoreHiltjo Posthumahiltjo@codemadness.orgcommit db328d0b3f3bb6988660f62428aed112618ca340
parent dcb355c8766619ad66061167c55b9f44c6ab7569
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 18 Dec 2018 18:11:29 +0100
rename getchar_ignore to getnext_ignore
dcb355c8766619ad66061167c55b9f44c6ab75692018-12-12T18:04:56Z2018-12-12T18:04:56Zcheck the returned length of xml_entitytostr() properlyHiltjo Posthumahiltjo@codemadness.orgcommit dcb355c8766619ad66061167c55b9f44c6ab7569
parent 1c506adc3502530014355ae774f8b306e9a8f4bb
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Wed, 12 Dec 2018 19:04:56 +0100
check the returned length of xml_entitytostr() properly
... this shouldn't matter though since the buffer is always big enough here.
1c506adc3502530014355ae774f8b306e9a8f4bb2018-12-11T20:28:10Z2018-12-11T20:28:10Zmain: no command-line arguments, add comment about the webHiltjo Posthumahiltjo@codemadness.orgcommit 1c506adc3502530014355ae774f8b306e9a8f4bb
parent df35821b012c868c75ca1bed237624c39a0d7e12
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 11 Dec 2018 21:28:10 +0100
main: no command-line arguments, add comment about the web
df35821b012c868c75ca1bed237624c39a0d7e122018-12-10T18:06:23Z2018-12-10T18:06:23Zadd README and LICENSEHiltjo Posthumahiltjo@codemadness.orgcommit df35821b012c868c75ca1bed237624c39a0d7e12
parent b148be9e590cdf8908414994d07742b1d7f72d8a
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 10 Dec 2018 19:06:23 +0100
add README and LICENSE
b148be9e590cdf8908414994d07742b1d7f72d8a2018-12-10T18:05:08Z2018-12-10T18:05:08Zenable pledge on OpenBSD nowHiltjo Posthumahiltjo@codemadness.orgcommit b148be9e590cdf8908414994d07742b1d7f72d8a
parent 732acec692f6038dc15bbced3cba9a65417ba13f
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 10 Dec 2018 19:05:08 +0100
enable pledge on OpenBSD now
732acec692f6038dc15bbced3cba9a65417ba13f2018-12-10T18:03:41Z2018-12-10T18:03:41ZMakefile: respect system CFLAGS, LDFLAGSHiltjo Posthumahiltjo@codemadness.orgcommit 732acec692f6038dc15bbced3cba9a65417ba13f
parent 3054084945aae1ceb22b87c9132dea71cf9e5108
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 10 Dec 2018 19:03:41 +0100
Makefile: respect system CFLAGS, LDFLAGS
3054084945aae1ceb22b87c9132dea71cf9e51082018-12-10T18:03:02Z2018-12-10T18:03:02ZXML tag parse improvements for PI and end tagsHiltjo Posthumahiltjo@codemadness.orgcommit 3054084945aae1ceb22b87c9132dea71cf9e5108
parent d908478d0f84bc275428fd71e934c993bb29211c
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 10 Dec 2018 19:03:02 +0100
XML tag parse improvements for PI and end tags
- Stricter parsing of tags, no whitespace stripping after <.
- For end tags the "internal" context x->tag would be "/sometag". Make sure
this matches exactly with the parameter tag.
- Reset tagname after parsing an end tag.
- Make end tag handling more consistent.
- Remove temporary variable taglen.
d908478d0f84bc275428fd71e934c993bb29211c2018-12-10T18:01:58Z2018-12-10T18:01:58Zignore incorrect unescaped HTML in <style> or <script> in a better wayHiltjo Posthumahiltjo@codemadness.orgcommit d908478d0f84bc275428fd71e934c993bb29211c
parent 0cca681092b680c5b80da62771d47fa383be6cd1
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Mon, 10 Dec 2018 19:01:58 +0100
ignore incorrect unescaped HTML in <style> or <script> in a better way
this way we can still use a (mostly) XML parser for HTML data.
0cca681092b680c5b80da62771d47fa383be6cd12018-12-09T10:33:20Z2018-12-09T10:33:20Zreplace control characters (including newline and tab), simplify dataentity handlerHiltjo Posthumahiltjo@codemadness.orgcommit 0cca681092b680c5b80da62771d47fa383be6cd1
parent 074560f704de31a111ed6c73edbb1f9c84413aa7
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 9 Dec 2018 11:33:20 +0100
replace control characters (including newline and tab), simplify dataentity handler
074560f704de31a111ed6c73edbb1f9c84413aa72018-11-13T17:06:51Z2018-11-13T17:06:51Zignore <title> tags in <style> or <script>Hiltjo Posthumahiltjo@codemadness.orgcommit 074560f704de31a111ed6c73edbb1f9c84413aa7
parent 3fae05df6ed35c258c73314f9daa07b92314e03e
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Tue, 13 Nov 2018 18:06:51 +0100
ignore <title> tags in <style> or <script>
seen in the wild web, this prevents an base64-encoded SVG XML inside a <style>
tag with a <title> element from being interpreted as a page title.
3fae05df6ed35c258c73314f9daa07b92314e03e2018-08-26T13:27:55Z2018-08-26T13:27:55Zxml: sync many XML parser improvementsHiltjo Posthumahiltjo@codemadness.orgcommit 3fae05df6ed35c258c73314f9daa07b92314e03e
parent 0af2d13062af1f2bb254de507233ed28e8f8c459
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sun, 26 Aug 2018 15:27:55 +0200
xml: sync many XML parser improvements
0af2d13062af1f2bb254de507233ed28e8f8c4592018-03-31T14:35:56Z2018-03-31T14:36:27Zsupport infinite length titles, no bufferingHiltjo Posthumahiltjo@codemadness.orgcommit 0af2d13062af1f2bb254de507233ed28e8f8c459
parent 239ce1bd6d3855175866412b2d9b8c64ddf80930
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 16:35:56 +0200
support infinite length titles, no buffering
239ce1bd6d3855175866412b2d9b8c64ddf809302018-03-31T14:32:39Z2018-03-31T14:36:23Zrename title to grabtitleHiltjo Posthumahiltjo@codemadness.orgcommit 239ce1bd6d3855175866412b2d9b8c64ddf80930
parent 202a253cfb5b88919c479d8abb177de9b4ef9925
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 16:32:39 +0200
rename title to grabtitle
202a253cfb5b88919c479d8abb177de9b4ef99252018-03-31T14:31:37Z2018-03-31T14:31:37Zrename run.sh to example, use required argumentHiltjo Posthumahiltjo@codemadness.orgcommit 202a253cfb5b88919c479d8abb177de9b4ef9925
parent 20cf1608ad4cae4c89101350da8d11c9f23512b1
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 16:31:37 +0200
rename run.sh to example, use required argument
20cf1608ad4cae4c89101350da8d11c9f23512b12018-03-31T11:43:34Z2018-03-31T11:43:34Zuse string length from xml_entitytostr, save a few linesHiltjo Posthumahiltjo@codemadness.orgcommit 20cf1608ad4cae4c89101350da8d11c9f23512b1
parent fee48ebf0343f68a35c2b65a0f2d82e8ac803725
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 13:43:34 +0200
use string length from xml_entitytostr, save a few lines
fee48ebf0343f68a35c2b65a0f2d82e8ac8037252018-03-31T11:34:12Z2018-03-31T11:34:12Zhide error outputHiltjo Posthumahiltjo@codemadness.orgcommit fee48ebf0343f68a35c2b65a0f2d82e8ac803725
parent 7d6279c8dec086f01bd2355d15292afa630238a4
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 13:34:12 +0200
hide error output
possibly "failed writing body", this is because title.c does not
read the whole output after reading the title tag.
7d6279c8dec086f01bd2355d15292afa630238a42018-03-31T11:01:57Z2018-03-31T11:01:57Zfix: don't use entity data for all tagsHiltjo Posthumahiltjo@codemadness.orgcommit 7d6279c8dec086f01bd2355d15292afa630238a4
parent 5c21827b86be877d3d5df7f7a9b810822e4f8e22
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 13:01:57 +0200
fix: don't use entity data for all tags
5c21827b86be877d3d5df7f7a9b810822e4f8e222018-03-31T10:59:22Z2018-03-31T10:59:22Zinitial insertionHiltjo Posthumahiltjo@codemadness.orgcommit 5c21827b86be877d3d5df7f7a9b810822e4f8e22
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Sat, 31 Mar 2018 12:59:22 +0200
initial insertion