codevoid.de/1/hn/comments_47092006.gph

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Wikipedia deprecates Archive.today, starts removing archive links
       
       
        realaaa wrote 9 hours 21 min ago:
        wow! but this felt like end of the story - here is LLM summary of
        timeline - sharing as is
        
        ---------
        
        Hereâs the chronology that the HN thread id=47092006 is about, based
        on the linked Ars Technica article and related sources.
        
        ---
        
        ## 1. What âstarted the argumentâ?
        
        The core dispute starts from a 2023 blog post by engineer Jani
        Patokallio on his site Gyrovague, investigating who is behind
        archive.today. That post, plus later FBI interest, led to:
        
        1. A *GDPR/takedown campaign* against the blog post.
        2. An *apparent DDoS* launched from archive.todayâs CAPTCHA page
        against his blog.
        3. *Threats* from the archive.today operator (âNoraâ) to associate
        Patokallioâs name with AI porn and other harassment.
        4. *Discovery that archive.today had altered archived pages* to insert
        Patokallioâs name.
        5. A *Wikipedia RfC* and decision to deprecate and blacklist
        archive.today links.
        
        The Hacker News thread you referenced is about the final step:
        Wikipediaâs decision to remove ~695,000 archive.today links.
        
        ---
        
        ## 2. Timeline of the situation
        
        ```mermaid
        timeline
            title archive.today â Wikipedia controversy chronology
        
            2012-2015 : Site founded as archive.is; later branded archive.today
            2023-08-05 : Patokallio publishes investigation into
        archive.todayâs ownership
            2025-10-30 : FBI subpoena to archive.todayâs registrar (Tucows)
            2025-11-05 : Heise reports FBI subpoena, links to Patokallioâs
        2023 post
            2026-01-08 : GDPR complaint from âNoraâ to Automattic re
        Patokallioâs post
            2026-01-10 : archive.today webmaster emails Patokallio asking for
        temporary takedown
            2026-01-11 : DDoS from archive.today CAPTCHA page against Gyrovague
        begins
            2026-01-14 : First public HN report about weird/DDoS behavior from
        archive.today
            2026-01-21 : gyrovague.com added to DNS blocklists used by ad
        blockers
            2026-01-25 : Email exchange escalates; âNoraâ threatens AI
        porn, âgay dating appâ, âNazi grandfatherâ
            2026-02-01 : Patokallio publishes detailed timeline and DDoS
        disclosure
            2026-02-07 : Wikipedia RfC opens on archive.today links
            2026-02-10 : Ars Technica reports on DDoS and Wikipedia considering
        blacklist
            2026-02-19 : DDoS code still present in archive.today CAPTCHA page
        (per Wikipedia guidance)
            2026-02-20 : RfC closed; consensus to deprecate/blacklist
        archive.today
            2026-02-20â21 : Major outlets report Wikipediaâs blacklist;
        guidance page created
        ```
        
        So, in the terms of your question:
        
        - *What started the argument* was Patokallioâs 2023 investigation
        into archive.todayâs ownership, which later coverage of the FBI
        subpoena amplified.
        - The *direct trigger for Wikipediaâs action* was the combination of:
          - The *DDoS* launched from archive.today against his blog.
          - The *threats* (AI porn, harassment) against him.
          - Evidence that the *archiveâs content had been tampered with*,
        violating Wikipediaâs trust in it as a citation
        source.ãturn4fetch0ããturn9find1ã
       
        _el1s7 wrote 1 day ago:
        Just went into a rabbit hole looking into this, wow, can't tell if this
        is just another drama on the weird wide web or something else.
       
        anovikov wrote 1 day ago:
        It doesn't work properly anyway anymore...
       
        rawling wrote 1 day ago:
        Is it not possible to create a non-repudiable archive of what a website
        served, when, entirely locally i.e. not relying on some third party
        site who might disappear or turn out to be unreliable?
        
        Could you not in theory record the whole TLS transaction? Can it not be
        replayed later and re-verified?
        
        Up until an old certificate leaks or is broken and you can fake
        anything "from back when it was valid", I guess.
       
          arboles wrote 1 day ago:
          I don't know, but archive sites could at least publish hashes of the
          content at archive time. This could be used to prove an archive
          wasn't tampered with later. I'm pretty underwhelmed by the Wayback
          Machine (archive.org), it's no better technically than archive.today.
       
            armchairhacker wrote 10 hours 7 min ago:
            How do you ensure the tampered content isnât re-hashed? Usually
            if youâre saving the hash in advance, you can save the whole
            archived page. Otherwise, you can use a regular archive service
            then hash the archived page yourself.
            
            The only way I know to ensure an archive isnât tampered is to
            re-archive it. If you sent a site to archive.today, archive.org,
            megalodon.jp, and ghostarchive.org, itâs unlikely that all will
            be tampered in the same way.
       
              arboles wrote 7 hours 7 min ago:
              A list of hashes (tuple of [hashed url+date metadata, hashed
              content]) takes much less disk space than the archive contents
              themselves. Archive websites could publish the list for all their
              content so it can be compared against in the future. People would
              save copies of the list. If you didn't store the list yourself
              ahead of time, and don't trust a third-party to be "the source of
              truth", the archive could've uploaded the hashes to the
              blockchain at archive time:
              
  HTML        [1]: https://gwern.net/timestamping
       
          justincormack wrote 1 day ago:
          Unfortunately you can't usefully replay TLS and be able to validate
          it, so no that does not work. Best strategy would probably be a
          public transparency log, but websites are pretty variable and dynamic
          so this would be unlikely to work for many.
       
            octoberfranklin wrote 1 day ago:
            Actually you can!  After all, TLS lacks the deniability features of
            more advanced cryptosystems (like OTR or Signal).
            
            The technology for doing this is called a Zero Knowledge Proof TLS
            Oracle: [1] [2] The 10k-foot view is that you pick the random
            numbers involved in the TLS handshake in a deterministic way, much
            like how zk proofs use the Fiat-Shamir transform.  In other words,
            instead of using true randomness, you use some hash of the
            transcript of the handshake so far (sort of).  Since TLS doesn't do
            client authentication the DH exchange involves randomness from the
            client.
            
            For all the blockchain haters out there: cryptocurrency is the
            reason this technology exists.    Be thankful.
            
  HTML      [1]: https://eprint.iacr.org/2024/447.pdf
  HTML      [2]: https://tlsnotary.org
       
        seanhly wrote 1 day ago:
        Curiously, this isn't the first time archive.today was implicated in a
        DDoS.  A HN post from three years back shows some pasted snippets of
        similar XmlHttpRequest code running on archive.ph (an archive.today
        synonym site).    Post link: [1] On that occasion, the target of the
        attack was a site named northcountrygazette.org, whose owner seems to
        have never become aware of the attack.    The HN commenter noted when
        they went to the site manually it was incredibly slow, which would
        suggest the DDoS attempt was effective.
        
        I tried to see if there was anything North Country Gazette had
        published that the webmaster of archive.today might have taken issue
        with, and I couldn't find anything in particular.  However, the
        "Gazette" had previously threatened readers with IP logging to
        prosecute paywall bypassers ( [2] ), and also blocks archivers in its
        robots.txt file, indicating it is hostile towards archiving in general.
        
        I can no longer access North Country Gazette, so perhaps it has since
        gone out of business.  I found a few archived posts from its dead
        website complaining of high server fees.  Like the target of this most
        recent DDoS, June Maxam, the lady behind North Country Gazette, also
        appears/appeared to be a sleuth.
        
  HTML  [1]: https://news.ycombinator.com/item?id=38233062
  HTML  [2]: https://news.slashdot.org/story/10/10/27/2134236/pay-or-else-n...
       
        alfiedotwtf wrote 1 day ago:
        It would be nice if there was a non-dynamic snapshot archive as well as
        the page itself. That way, if the loaded JavaScript stops causes it to
        stop rendering, at least thereâll be a static fallback
       
        jl6 wrote 1 day ago:
        Am I reading this rightâ¦ they tampered with an archived page and then
        changed it back? How do we know? Is there another archive site that has
        before and after proof?
       
          cnst wrote 1 day ago:
          They've changed usernames they use to post under.  That's the only
          "altered" allegation they've been accused of.
          
          BTW, they also alter paywalls and other elements, because otherwise,
          many websites won't show the main content these days.
          
          It kind of seems like "altered" is the new "hacker" today?
       
            Jordan-117 wrote 1 day ago:
            Specifically, they changed a "commenting as: [their alias]" UI
            element to "commenting as: [name of the blogger they were fighting
            with]".
            
            Compare (the changed element is near the very bottom of the page;
            replace the "[dot]" since these URLs seem to trigger spam filters
            for some commenters):
            
            archive [dot] is/gFD6Z
            
            megalodon [dot] jp/2026-0219-1628-23/
            
  HTML      [1]: https://archive.is:443/gFD6Z
       
          Gander5739 wrote 1 day ago:
          See
          
  HTML    [1]: https://en.wikipedia.org/wiki/Wikipedia%3ARequests_for_comme...
       
        dakolli wrote 1 day ago:
        The FBI called out archive.today a couple months ago, there's clearly a
        campaign against them by the USA (4th Reich), which stands principally
        against any information repository they don't control or have influence
        over (its Russian owned). This is simply donors of the Trump regime who
        own media companies requesting this because its the primary way around
        paywalls for most people who know about it.
       
        frenchtoast8 wrote 1 day ago:
        A bit off topic, but are there any self hosted open source archiving
        servers people are using for personal usage?
        
        I think ArchiveBox[1] is the most popular. I will give it a shot, but
        it's a shame they don't support URL rewriting[2], which would be
        annoying for me. I read a lot of blog and news articles that are split
        across multiple pages, and it would be nice if that article's "next
        page" link was a link to the next archived page instead of the original
        URL.
        
        1: [1] 2:
        
  HTML  [1]: https://archivebox.io/
  HTML  [2]: https://github.com/ArchiveBox/ArchiveBox/discussions/1395
       
          kseistrup wrote 1 day ago:
          Omnom comes to mind:
          
          * [1] *
          
  HTML    [1]: https://omnom.zone/
  HTML    [2]: https://github.com/asciimoo/omnom
       
          quinncom wrote 1 day ago:
          I like Readeck â [1] Open source. Self hosted or managed. Native
          iOS and Android apps.
          
          Its Content Scripts feature allows custom JS scripts that transform
          saved content, which could be used to do URL rewriting.
          
  HTML    [1]: https://codeberg.org/readeck/readeck
       
        nosamu wrote 1 day ago:
        Has anyone else noticed that some of Archive.today's X/Twitter captures
        [1] are logged in with an account called "advancedhosters" [2], which
        is associated with a web hosting company apparently located in Cyprus?
        The latest post [3] from the account links to a blog post [4] including
        private communications between the webmaster of Archive.today (using
        their previously-known "Volth" alias) and a site owner requesting a
        takedown. Also note that the previous post [5] from the
        "advancedhosters" account was a link to a pro-Russia, anti-Ukraine
        article, archived via Archive.today of course. Seems like an
        interesting lead to untangle. [1] [2] [3] [2]
        /status/1731129170091004412 [4] [4] [5] [2] /status/1501971277099286539
        
  HTML  [1]: https://archive.today/20240714173022/https://x.com/archiveis/s...
  HTML  [2]: https://x.com/advancedhosters
  HTML  [3]: https://x.com/advancedhosters/status/1731129170091004412
  HTML  [4]: https://lj.rossia.org/users/mopaiv/257.html
  HTML  [5]: https://x.com/advancedhosters/status/1501971277099286539
       
          jeroenhd wrote 1 day ago:
          It could be a donated account. I've noticed archive.whatever also
          bypasses some paywalls by using legitimate account logins but I doubt
          there's one person going around subscribing to every news outlet that
          gets any coverage.
          
          If archive.whatever wasn't so useful to the general public, it'd be
          hard to distinguish from a criminal operation given the way it
          operates, unlike say the Internet Archive who goes through all of the
          proper legal paperwork to be a real nonprofit.
       
          snigsnog wrote 1 day ago:
          Lead to what?
       
            Kiboneu wrote 1 day ago:
            Thatâs what OP wants to find out.
       
              snigsnog wrote 20 hours 23 min ago:
              No, what information is he hoping to find? Does he also want to
              doxx the website owner?
       
        comeonbro wrote 1 day ago:
        There is an enormous amount of stuff that is only on archive.today,
        including stuff that is otherwise gone forever.  A mix of stuff that
        somebody only ever did archive.today on and not archive.org, and stuff
        that could only be archived on archive.today because archive.org fails
        on it.
        
        Anything on twitter post-login-wall for one.  A million
        only-semi-paywalled news articles for others.  But mainly an
        unfathomably long tail.
        
        It was extremely distressing when the admin started(?) behaving badly
        for this reason.  That others are starting to react this way to it is
        understandable.  What a stupid tragedy.
       
        andai wrote 1 day ago:
        Sounds like there's a gap in the market for a "commons" archive...
        maybe powered by something p2p like BitTorrent protocol?
        
        This would have sounded Very Normal in the 2000s... I wonder if we can
        go back :)
       
          PhilipRoman wrote 1 day ago:
          IMO there is actually a very low hanging fruit here, even without P2P
          or DHTs we could have an URI scheme that consists of a domain and
          document hash. It is then up to the user to add alternate mirrors for
          domains. Aside from privacy, it doesn't really matter who answers
          these requests since the documents are self-signing.
       
          bawolff wrote 1 day ago:
          P2p is generally bad for this usecase. P2P generally only works for
          keeping popular content around (content gets dropped when the last
          peer that cares disconnects). If the content was popular it wouldnt
          need to be archived in the first place.
       
            andai wrote 1 day ago:
            I think if you take this idea far enough you end up reinventing
            taxes from first principles.
       
            quotemstr wrote 1 day ago:
            Imagine a proof-of-space cryptocurrency that encouraged archiving
            long-tail data.
       
              pwdisswordfishy wrote 19 hours 1 min ago:
              /dev/random as a free money printer? Sign me up.
       
        1vuio0pswjnm7 wrote 1 day ago:
         [1] archive.today is very popular on HN; the opaque, shortened URLs
        are promoted on HN every day
        
        I can't use archive.today.  I tried but gave up.  Too many hassles.  I
        might be in the minority but I know I'm not the only one.  As it
        happens. I have not found any site that I cannot access without it
        
        The most important issue with archive.today though is the person
        running it, their past and present behaviour.  It speaks for itself
        
        Whomever it is, they have lot of info about HN users' reading habits
        given that archive.today URLs are so heavily promoted by HN submitters,
        commenters and moderators
        
  HTML  [1]: https://web.archive.org/web/20260220191245if_/https://arstechn...
       
          wolvoleo wrote 21 hours 29 min ago:
          > Whomever it is, they have lot of info about HN users' reading
          habits given that archive.today URLs are so heavily promoted by HN
          submitters, commenters and moderators
          
          Anyone interested in the reading habits of HN users can just take a
          look at news.ycombinator.com ;)
       
          1vuio0pswjnm7 wrote 23 hours 13 min ago:
          Archive.today wants/needs EDNS subnet
          
          "Geolocation" as a justication is ambiguous
          
          Why a need for geolocation
          
          Geolocation can be used for multiple purposes
          
          "DNS performance" is only one purpose
          
          Other purposes might offer the user no benefit, and might even be
          undesirable for users
          
          As a result, some users don't send EDNS subnet.  It's always been
          optional to send it
          
          Even public resolvers, third party DNS services, like Cloudflare,
          recognise the tradeoffs for users and allow users to avoid sending
          it.  Popular DNS software makes compiling support for EDNS subnet
          optional
          
          Archive.today wants/needs EDNS subnet so bad it tries to gather it
          using a tracking pixel or it tries to block users who dont send it,
          e.g., Cloudflare users
          
          Thus, before one even considers all the other behaviour of this
          website operator, some of which is mentioned in this thread, there is
          a huge red flag for anyone who pays attention to EDNS subnet
          
          As with almost all websites repeated DNS lookups are not an absolute
          requirement for successful HTTP requests
          
          There are some IP addresses for archive.{today,is,md,ph,li,...} that
          have continued to work for years
       
          1vuio0pswjnm7 wrote 1 day ago:
          "archive.today" as used here means the collection of archive.tld
          domains, where .tld could be ".is", ".md", ".ph", etc.
          
          "promoted" as used here means placing an archive.tld URL at the top
          of an HN thread so that many HN readers will 
          follow it, or placing these URLs elsewhere in threads
       
          diath wrote 1 day ago:
          > Whomever it is, they have lot of info about HN users' reading
          habits given that archive.today URLs are so heavily promoted by HN
          submitters, commenters and moderators
          
          It's not promoted, it's just used as a paywall bypass so everyone can
          read the linked article.
       
          bawolff wrote 1 day ago:
          The fact is i cant have a discussion about a paywalled article
          without reading it. Archive.today is popular as a paywall bypass
          because nobody wants HN to devolve into debate based on a headline
          where nobody has rtfa.
       
          fouc wrote 1 day ago:
          you can change the tld of any archive.today link if .today doesn't
          work. for example archive.ph, archive.is, archive.md, etc
       
            qingcharles wrote 1 day ago:
            There's a DNS issue between Archive Today and some ISPs which
            causes their domains not to resolve properly, which is why some
            people have a lot of trouble using it.
       
              justincormack wrote 1 day ago:
              Its not "a DNS issue" they are banned in many countries and there
              are ongoing court cases, so various enforcement mechanisms are
              used.
       
                jdiff wrote 1 day ago:
                There are also, separately, DNS issues that Archive.today
                chooses to block certain providers from. For instance:
                
  HTML          [1]: https://news.ycombinator.com/item?id=19828317
       
          belviewreview wrote 1 day ago:
          I use archive.today all the time. How do you access pages, like for
          instance on the economist, without it?
       
            ranger_danger wrote 23 hours 2 min ago:
            For me, all archive.* links just present an endless captcha loop. I
            am not using CF DNS or any proxy/VPN, but even if I do try those
            things, it still doesn't work.
       
            1vuio0pswjnm7 wrote 1 day ago:
            http-request set-header user-agent "Mozilla/5.0 (Linux; Android 14)
            AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile
            Safari/537.36 Lamarr" if { hdr(host) -m end economist.com }
            
            Years ago I used some other workaround that no longer works, maybe
            something like amp.economist.com.  AMP with text-only browser was a
            useful workaround for many sites
            
            Workarounds usually don't last forever.  Websites change from time
            to time.  This one will stop working at some point
            
            There are some people who for various reasons cannot use
            archive.today
       
              gpvos wrote 1 day ago:
              Which utility, extension, tool or language is that?
       
                1vuio0pswjnm7 wrote 1 day ago:
                It's from an haproxy configuration file
                
                This unfamiliarity is why I try to use programs that more HN
                readers are familiar with, like curl or wget, in HN examples. 
                But I find those programs awkward to use.  The examples may
                contain mistakes.  I don't use those programs in real life
                
                For making HTTP requests I use own HTTP generators, TCP
                clients, and local forward proxies
                
                Given the options (a) run a graphical web browser and enable
                Javascript to solve an archive.today CAPTCHA that contains some
                fetch() to DDoS a blogger or (b) add a single line to a
                configuration file and use whatever client I want, no
                Javascript required, I choose (b)
       
            201984 wrote 1 day ago:
            With the paywall blocker so good it got banned! You can also get it
            on Android.
            
  HTML      [1]: https://gitflic.ru/project/magnolia1234/bypass-paywalls-fi...
       
              jwrallie wrote 1 day ago:
              A Russian domain git website hosting just a readme.md and a copy
              of the MIT license but no source code? Just the extension files?
       
                moho wrote 1 day ago:
                The author got banned from github and gitlab after DMCA
                takedowns. The code used to be available in those, but I guess
                he got tired of starting over?
                
                Anyway, extensions are just signed zip files. You can extract
                them and view the source. BPC sources are not compressed or
                obfuscated. The extension is evaluated and signed by Mozilla
                (otherwise it wouldn't install in release-channel Firefox), if
                you put any stock in that.
       
            ouhamouch wrote 1 day ago:
            for instance on the economist:
            
  HTML      [1]: https://news.ycombinator.com/item?id=46060487
       
            mhitza wrote 1 day ago:
            If dang and tomhow enforce a policy against paywalled content would
            garner less interest in accessing those pages via third parties.
            Most news gets reported by multiple outlets in general, so the same
            discussions would still surface.
       
        tonymet wrote 1 day ago:
        Wikipedia's own page on this topic is much more succinct about the
        context and change in policy
        
  HTML  [1]: https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidance
       
          cnst wrote 1 day ago:
          > Change the original source to something that doesn't need an
          archive (e.g., a source that was printed on paper), or for which a
          link to an archive is only a matter of convenience.
          
          They're basically recommending changing verifiable references that
          can easily be cross-checked and verified, to "printed on paper"
          sources that could likely never be verified by any other Wikipedian,
          and can easily be used to provide a falsification and bias that could
          go unnoticed for extended periods of time.
          
          Honestly, that's all you need to know about Wikipedia.
          
          The "altered" allegation is also disingenuous.    The reason
          archive.org never works, is precisely because it doesn't alter the
          pages enough.  There's no evidence that archive.today has altered any
          actual main content they've archived; altering the hidden fields,
          usernames and paywalls, as well as random presentation elements to
          make the page look properly, doesn't really count as "altered" in my
          book, yet that's precisely what the allegation amounts to.
       
            Jordan-117 wrote 1 day ago:
            The accusation is not that they alter pages at all -- they
            obviously need to in order to make some pages readable/functional,
            bypass paywalls, or hide account names used to do so. The Wayback
            Machine does something similar with YouTube to make old videos
            playable.
            
            The allegation here is that they altered page content not just to
            remove their own alias, but to insert the name of the blogger they
            were targeting. That moves it from a defensible technical change
            for accessibility to being part of their bizarre revenge campaign
            against someone who crossed them.
       
            tonymet wrote 1 day ago:
            this was referenced as the evidence for archive.today modifying
            content
            
  HTML      [1]: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comme...
       
            tonymet wrote 1 day ago:
            You should add this context to the talk page.  You can do it
            anonymously without login.  I wasnât aware of either side of this
            allegation, and itâs helpful to understand this context.
       
              tonymet wrote 1 day ago:
              Are there people who just downvote every comment? How is this a
              bad suggestion?  If people want change on WP, they should
              contribute to the discussion there.
       
        tetris11 wrote 1 day ago:
        Archive.today's domain registrar is Tucows for anyone wondering
       
          ValentineC wrote 19 hours 10 min ago:
          Just curious: is this of any significance?
       
        krick wrote 1 day ago:
        I believe there are multiple options with different degree of
        "half-baked"-ness, but can anyone name the best self-hosted version of
        this service?
        
        Ultimately, what we all use it for is pretty straight-forward, and it
        seems like by now we should've arrived at having approximately one best
        implementation, which could be used both for personal archiving and for
        iternet-facing instances (perhaps even distributed). But I don't know
        if we have.
       
          robotnikman wrote 1 day ago:
          I'm wondering the same thing, would be great to have something
          similar for personal use
       
        croes wrote 1 day ago:
        > âIâm glad the Wikipedia community has come to a clear consensus,
        and I hope this inspires the Wikimedia Foundation to look into creating
        its own archival service,â he told us.
        
        Hardly possible for Wikimedia to provide a service like archive.today
        given the legal trouble of the latter.
        
        Strangely naive.
       
        TZubiri wrote 1 day ago:
        They seem totally unrelated to the Internet Archive. They probably only
        ever got on Wikipedia by leeching of the IA brand and confusing enough
        people to use them
       
          Onavo wrote 1 day ago:
          Wayback machine won't bypass paywall nor pirate content, not to
          mention they are under US jurisdiction. You can't have your cake and
          eat it.
       
            krick wrote 1 day ago:
            Honestly, IMHO archive.today is just so much nicer to use in every
            aspect than IA, that unless they outright start to distribute
            malware (I mean, like, via the page itself â otherwise it's
            pretty much irrelevant), I don't think I'll stop using it.
       
        nubinetwork wrote 1 day ago:
        I noticed I've started being redirected to a blank nginx server for
        archive.is...  but only the .is domain, .ph and .today work just fine. 
        I wonder if they ended up on an adblocker or two.
       
          stephen_g wrote 1 day ago:
          There was some beef the site owner had with Cloudflare where if your
          were using Cloudflare DNS it wouldnât serve anything to you? Is
          that still happening?
          
          Not sure why it would only be on archive.is and not the others but
          âisâ loads for me.
       
            nubinetwork wrote 1 day ago:
            Oh maybe...  I don't use cloudflare DNS, but maybe one of my rpz
            zones does something weird...
       
        karel-3d wrote 1 day ago:
        Archive.is is now publishing really weird posts on their Tumblr blog,
        related to the whole thing [1]
        
  HTML  [1]: https://archive-is.tumblr.com/post/806832066465497088/ladies-a...
  HTML  [2]: https://archive-is.tumblr.com/post/807584470961111040/it-seems...
       
          ricardobeat wrote 1 day ago:
          The word salad with ukraine, arms trade, nazis, hunter biden, leave
          no doubt the operator is from Russia.
       
            karel-3d wrote 1 day ago:
            He says elsewhere he comes from right wing activism. He could be
            some hard right type. But he says elsewhere he is outside of US
            jurisdiction. And the fact that he reacts so violently means that
            the original blogpost is somehow right. So probably Russia
       
          dmix wrote 1 day ago:
          Heâs probably being purposefully vague which makes for difficult
          reading.
       
        wuschel wrote 1 day ago:
        There is an post describing the possibility of an organised campaign
        against archive.today [1] 
        
        How does the tech behind archive.today work in detail? Is there any
        information out there that goes beyond the Google AI search reply or
        this HN thread [2]? [1] [2]
        
  HTML  [1]: https://algustionesa.com/the-takedown-campaign-against-archive...
  HTML  [2]: https://algustionesa.com/the-takedown-campaign-against-archive...
  HTML  [3]: https://news.ycombinator.com/item?id=42816427
       
          robotnikman wrote 23 hours 22 min ago:
          A big fear of mine is something happening to archive.is
          
          There is so much is archived there, to lose it all would be a
          tragedy.
       
          pyuser583 wrote 1 day ago:
          Was that written by AI? It sounds like AI, spends lots of time
          summarizing other posts, and has no listed author. My AI alarm is
          going off.
       
            KennyBlanken wrote 1 day ago:
            Ars was caught recently using AI to write articles when the AI
            hallucinated about a blogger getting harassed by someone using AI
            agents. The article quoted his blog and all the quotes were
            nonsense.
       
              mrweasel wrote 1 day ago:
              Even if something is AI generated the author, and the editor,
              should at least attempt to read back the article. English isn't
              my native language, so that obviously plays in, but very
              frequently I find that articles I struggle to read are AI
              generated, they certainly have that AI feel.
              
              It would be interesting to run the numbers, but I get the feeling
              that AI generated articles may have a higher LIX number. Authors
              are then less inclined to "fix" the text, because longer word
              makes them seem smarter.
       
                moron4hire wrote 1 day ago:
                "Should" and "will" are completely different things. My kids
                "should" brush their teeth every night without me having to
                tell them. But they won't.
       
                  mrweasel wrote 1 day ago:
                  Sounds like you're suggesting an RFC for journalists and
                  editors :-)
       
            girvo wrote 1 day ago:
            Yeah nearly certainly.
       
            lambda wrote 1 day ago:
            Yeah, wow. Definitely setting off my AI summary alarm.
       
          bdhcuidbebe wrote 1 day ago:
          They are able to scrape paywalled sites at random, so im guessing a
          residential botnet is used.
       
            pingou wrote 1 day ago:
            But how do they bypass the paywall? They can't just pretend to be
            Google by changing the user-agent, this wouldn't work all the time,
            as some websites also check IPs, and others don't even show the
            full content to Google.
            
            They also cannot hijack data with a residential botnet or buy
            subscriptions themselves. Otherwise, the saved page would contain
            information about the logged-in user. It would be hard to remove
            this information, as the code changes all the time, and it would be
            easy for the website owner to add an invisible element that
            identifies the user. I suppose they could have different
            subscriptions and remove everything that isn't identical between
            the two, but that wouldn't be foolproof.
       
              wbmva wrote 23 hours 46 min ago:
              On the network layer, I don't know. But on the WWW layer,
              archive.today operates accounts that are used to log into
              websites when they are snapshotted. IIRC, the archive.today
              manipulates the snapshots to hide the fact that someone is logged
              in, but sometimes fails miserably: [1] [2] The second shows
              volth's Github notifications. Volth was a major nix-pkgs
              contributor, but his Github account disappeared.
              
  HTML        [1]: https://megalodon.jp/2026-0221-0304-51/https://d914s229q...
  HTML        [2]: https://archive.is/Y7z4E
  HTML        [3]: https://github.com/orgs/community/discussions/58164
       
              rkagerer wrote 1 day ago:
              I thought saved pages sometimes do contain users' IP's? [1] The
              way I (loosely) understand it, when you archive a page they send
              your IP in the X-Forwarded-For header.    Some paywall operators
              render that into the page content served up, which then causes it
              to be visible to anyone who clicks your archived link and Views
              Source.
              
  HTML        [1]: https://www.reddit.com/r/Advice/comments/5rbla4/comment/...
       
              bdhcuidbebe wrote 1 day ago:
              > But how do they bypass the paywall?
              
              Iâm guessing by using a residential botnet and using existing
              credentials by unknowingly âvictimsâ by automating their
              browsers.
              
              > Otherwise, the saved page would contain information about the
              logged-in user.
              
              If you read this article, theres plenty of evidence they are
              manipulating the scraped data.
              
              But Iâm just speculating hereâ¦
       
                pingou wrote 1 day ago:
                But in the article they talk about manipulating users devices
                to do a DDOS, not scrape websites. And the user going to the
                archive website is probably not gonna have a subscription, and
                anyway I'm not sure that simply visiting archive.today will
                make it able to exfiltrate much information from any other
                third party website since cookies will not be shared.
                
                I guess if they can control a residential botnet more
                extensively they would be able to do that, but it would still
                be very difficult to remove login information from the page,
                the fact that they manipulated the scraped data for totally
                unrelated reasons a few times proves nothing in my opinion.
       
                  notpushkin wrote 1 day ago:
                  They do remove the login information for their own accoubts
                  (e.g. the one they use for LinkedIn sign-up wall). Their
                  implementation is not perfect, though, which is how the
                  aliases were leaked in the first place.
       
              seanhly wrote 1 day ago:
              There are some pretty robust browser addons for bypassing article
              paywalls, notably [1] This particular addon is blocked on most
              western git servers, but can still be installed from Russian git
              servers.  It includes custom paywall-bypassing code for pretty
              much every news websites you could reasonably imagine, or at
              least those sites that use conditional paywalls (paywalls for
              humans, no paywalls for big search engines).  It won't work on
              sites like Substack that use proper authenticated content pages,
              but these sorts of pages don't get picked up by archive.today
              either.
              
              My guess would be that archive.today loads such an addon with its
              headless browser and thus bypasses paywalls that way.  Even if
              publishers find a way to detect headless browsers, crawlers can
              also be written to operate with traditional web browsers where
              lots of anti-paywall addons can be installed.
              
  HTML        [1]: https://gitflic.ru/project/magnolia1234/bypass-paywalls-...
       
                wuschel wrote 1 day ago:
                Wow, did not know about the regional blocking of git servers!
                Makes me wonder what else is kept from the western audience,
                and for what reason this blocking is happening.
                
                Thanks for sketching out their approach and for the URI.
       
                expedition32 wrote 1 day ago:
                I use this add on. It does get blocked sometimes but they
                update the rules every couple of weeks.
       
                pingou wrote 1 day ago:
                But don't news websites check for ip addresses to make sure
                they really are from Google bots?
       
                  seanhly wrote 1 day ago:
                  Most of them donât check the IP, it would seem.  Google
                  acquires new IPs all the time, plus there are a lot of other
                  search systems that news publishers donât want to
                  accidentally miss out on.  Itâs mostly just client side JS
                  hiding the content after a time delay or other techniques
                  like that.  I think the proportion of the population using
                  these addons is so low, it would cost more in lost SEO for
                  news publishers to restrict crawling to a subset of IPs.
       
          8cvor6j844qw_d6 wrote 1 day ago:
          archive.today works surprisingly well for me, often succeeding where
          archive.org fails.
          
          archive.org also complies with takedown requests, so it's worth
          asking: could the organised campaign against archive.today have
          something to do with it preserving content that someone wants
          removed?
       
            wolvoleo wrote 1 day ago:
            They preserve a lot of paywalled content so yeah I'm sure there's
            enough financial incentives to bother them :(
       
          ouhamouch wrote 1 day ago:
          There are number of blog posts like
          
          owner-archive-today . blogspot . com
          
          2 years old, like J.P's first post on AT
       
          leonidasv wrote 1 day ago:
          If they're under an organised defamation campaign, they're not
          helping themselves by DDoSing someone else's blog and editing
          archived pages.
       
            behringer wrote 1 day ago:
            Is that, itself, true or disinformation?
       
              daymanstep wrote 1 day ago:
              I've also noticed archive.today injecting suspicious looking ads
              into archived pages that originally did not have ads.
       
              thefilmore wrote 1 day ago:
              It's true.
              
  HTML        [1]: https://archive-is.tumblr.com/post/808911640210866176/pe...
       
              ndiddy wrote 1 day ago:
              They did edit archived pages. They temporarily did a find/replace
              on their archive to replace "Nora Puchreiner" (an alias the site
              operator uses) with "Jani Patokallio" (the name of the blogger
              who wrote about archive.today's owner). [1] They also tampered
              with their archive for a few of the social media sites (Twitter,
              Instagram, Blogger) by changing the name of the signed in account
              to Jani Patokallio. [2] I think Wikipedia made the right
              decision, you can't trust an archival service for citations if
              every time the sysop gets in a row they tamper with their
              database.
              
  HTML        [1]: https://megalodon.jp/2026-0219-1634-10/https://archive.p...
  HTML        [2]: https://megalodon.jp/2026-0220-0320-05/https://archive.i...
       
                UqWBcuFx6NV4r wrote 1 day ago:
                This is so âearly internet beefâ quaint. What next? Are
                they going to G-line each other?
       
                  behringer wrote 12 hours 41 min ago:
                  It it utterly stupid when you consider that the host needed
                  to replace their username with something to conceal their
                  user accounts.
       
              stuffoverflow wrote 1 day ago:
              I've not seen any evidence of them editing archived pages BUT the
              DDOSing of gyrovague.com is true and still actively taking place.
              The author of that blog is Finnish leading archive.today to ban
              all Finnish IPs by giving them endless captcha loops. After
              solving the first captcha, the page reloads and a javascript
              snippet appears in the source that attempts to spam gyrovague.com
              with repeated fetches.
       
                mmooss wrote 1 day ago:
                How do you know that? Did you see it (do you have a Finnish
                IP?)?
       
                  stuffoverflow wrote 1 day ago:
                  Yes I have Finnish IP and just before I wrote that post I
                  tested it to make sure it was still happening.
                  
                  I assume it must be a blanket ban on Finnish IPs as there has
                  been comments about it on Reddit and none of my friends can
                  get it to work either. 5 different ISPs were tried. So at the
                  very least it seems to affect majority of Finnish residential
                  connections.
       
                    mmooss wrote 1 day ago:
                    > just before I wrote that post I tested it to make sure it
                    was still happening
                    
                    That's awesome. I wish everyone made sure of their facts.
                    Thanks.
       
                  delusional wrote 1 day ago:
                  This is quite an interesting question. For a single
                  datapoint, I happen to have access to a VPN that's supposedly
                  in Finland, and connecting through that didn't make any
                  captcha loop appear on archive.today. The page worked fine.
                  
                  Now it's obviously possible that my VPN was whitelisted
                  somehow, or that the GeoIP of it is lying. This is just a
                  singular datapoint.
       
                    BoredPositron wrote 1 day ago:
                    VPNs usually don't tell you much about residential
                    experiences.
       
                    hnlmorg wrote 1 day ago:
                    Itâs also pretty common for VPNs to have exit nodes
                    physically located in different counties to where they
                    report those IPs (to GeoIP databases) as having originated
                    from.
       
                    fear-anger-hate wrote 1 day ago:
                    As another datapoint with Finnish IP from Mullvad VPN:
                    CAPTCHA loop and indeed after solving first CAPTCHA this
                    can be found in page source:
                    
                    setInterval(function(){fetch(" [1] ",{
                    referrerPolicy:"no-referrer",mode:"no-cors" });},1400);
                    
  HTML              [1]: https://gyrovague.com/tag/
       
              drum55 wrote 1 day ago:
              It was true and visible when reported, yeah.
       
            ouhamouch wrote 1 day ago:
            it gives them a voice.
       
              duskwuff wrote 1 day ago:
              And that voice is practically shouting, "I AM UNTRUSTWORTHY".
       
                tolerance wrote 1 day ago:
                Or some shrewd sort of tactician.
       
                ouhamouch wrote 1 day ago:
                that is not the worst scream (especially after FBI and Russian
                trail). better to shout anything than to die in silence
       
                  eddythompson80 wrote 1 day ago:
                  What kinda logic is that? If you don't want to die in
                  silence, then shout something sensical. But if you're gonna
                  shout garbage, just die in silence.
       
                    tolerance wrote 1 day ago:
                    People say they want the old weird web back. Well thereâs
                    this.
       
                    ouhamouch wrote 1 day ago:
                    The property of the medium: no one would repost or discuss
                    "something sensical".
       
          iamnothere wrote 1 day ago:
          There was also the recent news about sites beginning to block the
          Internet Archive. Feels like we are gearing up for the next phase of
          the information war.
       
        ChocMontePy wrote 1 day ago:
        I noticed last year that some archived pages are getting altered.
        
        Every Reddit archived page used to have a Reddit username in the top
        right, but then it disappeared. "Fair enough," I thought. "They want to
        hide their Reddit username now."
        
        The problem is, they did it retroactively too, removing the username
        from past captures.
        
        You can see on old Reddit captures where the normal archived page has
        no username, but when you switch the tab to the Screenshot of the
        archive it is still there. The screenshot is the original capture and
        the username has now been removed for the normal webpage version.
        
        When I noticed it, it seemed like such a minor change, but with these
        latest revelations, it doesn't seem so minor anymore.
       
          palmotea wrote 1 day ago:
          > When I noticed it, it seemed like such a minor change, but with
          these latest revelations, it doesn't seem so minor anymore.
          
          That doesn't seem nefarious, though. It makes sense they wouldn't
          want to reveal whatever accounts they use to bypass blocks, and the
          logged-in account isn't really meaningful content to an archive
          consumer.
          
          Now, if they were changing the content of a reddit post or comment,
          that would be an entirely different matter.
       
            TehCorwiz wrote 1 day ago:
            If it's not nefarious why isn't it documented as part of their
            policies? They're not tracking those changes and making clear it
            was anonymization, why not? If they're not tracking and publishing
            changes to the documents what's to say they haven't edited other
            things? The short answer is that without another archived copy we
            just don't know and that's what's making people uncomfortable. They
            also injected malicious JS into the site. What's to stop them from
            doing that again? Trust and transparency are the name of the game
            with libraries. I could care less about the who they are, but their
            actions as steward of a collection for posterity fail to encourage
            my trust.
       
            zymhan wrote 1 day ago:
            Editing what is billed as an archive defeats the purpose of an
            "archive".
       
              maxloh wrote 1 day ago:
              Don't be surprised by this, there are a lot more edits than you
              think. For example, CSS is always inlined so that pages could
              render the same as it was archived.
       
                raincole wrote 1 day ago:
                CSS inlining happens during the process of archiving, no?
                
                The issue here is to edit archived pages retrospectively.
       
              ajam1507 wrote 1 day ago:
              The relevant part of the page to archive is the content of the
              page, not the user account that visited the page. Most sane
              people would consider two archives of the same page with
              different user accounts at the top, the same page.
       
              palmotea wrote 1 day ago:
              > Editing what is billed as an archive defeats the purpose of an
              "archive".
              
              No, certain edits are understandable and required. Even the
              archive.org edits its pages (e.g. sticks banners on them and does
              a bunch of stuff to make them work like you'd expect).
              
              Even paper archives edit documents (e.g. writing sequence numbers
              on them, so the ordering doesn't get lost).
              
              Disclosing exactly what account was used to download a particular
              page is arguably irrelevant information, and may even compromise
              the work of archiving pages (e.g. if it just opens the account to
              getting blocked).
       
        casey2 wrote 1 day ago:
        Anecdotally I generally see archive.is/archive.today links floating
        around "stochastic terrorist" sites and other hate cults.
       
          oytis wrote 1 day ago:
          I see them everywhere where paywalled content is referenced
       
          snigsnog wrote 1 day ago:
          Shows that it's a great archival service if the most censored people
          are able to use it without their archives being censored.
       
        bjourne wrote 1 day ago:
        FYI, archive.today is NOT the Internet Archive/Wayback Machine.
       
          super256 wrote 1 day ago:
          I prefer archive.today because the Internet Archiveâs Wayback
          Machine allows retrospective removals of archived pages. If a URL has
          already been crawled and archived, the site owner can later add that
          URL to robots.txt and request a re-crawl. Once the crawler detects
          the updated robots.txt, previously stored snapshots of that page can
          become inaccessible, even if they were captured before the rule was
          added.
          
          Unfortunately this happens more often than one would expect.
          
          I found this out when I preserved my very first homepage I made as a
          child on a free hosting service. I archived it on archive.org, and
          thought it would stay there forever. Then, in 2017 the free host
          changed the robots.txt, closed all services, and my treasured memory
          was forever gone from the internet. ;(
       
            pgalvin wrote 1 day ago:
            This information is now many years out of date - they no longer
            have this policy.
       
              extraduder_ire wrote 1 day ago:
              Any idea when that changed? I've been unable to access historical
              sites in the past because someone parked the domain and had a
              very restrictive robots.txt on it.
       
              snigsnog wrote 1 day ago:
              Even so you can still just request your site to be removed:
              
  HTML        [1]: https://help.archive.org/help/how-do-i-request-to-remove...
       
        rdiddly wrote 1 day ago:
        So toward the end of last year, the FBI was after archive.today,
        presumably either for keeping track of things the current
        administration doesn't want tracked, or maybe for the paywall thing (on
        behalf of rich donors/IP owners). [1] That effort appears to have gone
        nowhere, so now suddenly archive.today commits reputational suicide? I
        don't suppose someone could look deeper into this please?
        
  HTML  [1]: https://gizmodo.com/the-fbi-is-trying-to-unmask-the-registrar-...
       
          ndiddy wrote 1 day ago:
          The archive.today operator claims on his blog that this was nothing
          major: [1] > Regarding the FBIâs request, my understanding is that
          they were seeking some form of offline action from us â anything
          from a witness statement (âYes, this page was saved at
          such-and-such a time, and no one has accessed or modified it
          sinceâ) to operational work involving a specific group of users.
          These users are not necessarily associates of Epstein; among our
          users who are particularly wary of the FBI, there are also less
          frequently mentioned groups, such as environmental activists or
          right-to-repair advocates.
          
          > Since no one was physically present in the United States at that
          time, however, the matter did not progress further.
          
          > You already know who turned this request into a full-blown panic
          about âthe FBI accusing the archive and preparing to confiscate
          everything.â
          
          Not sure who he's talking about there.
          
  HTML    [1]: https://lj.rossia.org/users/archive_today/
       
        anilakar wrote 1 day ago:
        > If you want to pretend this never happened â delete your old
        article and post the new one you have promised. And I will not write
        âan OSINT investigationâ on your Nazi grandfather
        
        From hero to a Kremlin troll in five seconds.
       
        xurukefi wrote 1 day ago:
        Kinda off-topic, but has anyone figured out how archive.today manages
        to bypass paywalls so reliably? I've seen people claiming that they
        have a bunch of paid accounts that they use to fetch the pages, which
        is, of course, ridiculous. I figured that they have found an
        (automated) way to imitate Googlebot really well.
       
          cnst wrote 1 day ago:
          It's because it's actively maintained, and bypassing the paywalls is
          its whole selling point, thus, they do have to be good at it.
          
          They bypass the rendering issues by "altering" the webpages.  It's
          not uncommon to archive a page, and see nothing because of the
          paywalls; but then later on, the same page is silently fixed.  They
          have a Tumblr where you can ask them questions; at one point, it's
          been quite common for everyone to ask them to fix random specific
          pages, which they did promptly.
          
          Honestly, you cannot archive a modern page, unless you alter it.  Yet
          they're now being attacked under the pretence of "altering" webpages,
          but that's never been a secret, and it's technologically impossible
          to archive without altering.
       
            Jordan-117 wrote 1 day ago:
            There's a pretty massive difference between altering a snapshot to
            make it archivable/readable and doing it to smear and defame a
            blogger who wrote about you.
       
          Cider9986 wrote 1 day ago:
          I imagine accounts are the only way that archive.today works on sites
          like 404media.co that seem to have server sided paywalls. Similarly,
          twitter has a completely server sided paywall.
       
          jsheard wrote 1 day ago:
          > I figured that they have found an (automated) way to imitate
          Googlebot really well.
          
          If a site (or the WAF in front of it) knows what it's doing then
          you'll never be able to pass as Googlebot, period, because the
          canonical verification method is a DNS lookup dance which can only
          succeed if the request came from one of Googlebots dedicated IP
          addresses. Bingbot is the same.
       
            xurukefi wrote 1 day ago:
            There are ways to work around this. I've just tested this: I've
            used the URL inspection tool of Google Search Console to fetch a
            URL from my website, which I've configured to redirect to a
            paywalled news article. Turns out the crawler follows that redirect
            and gives me the full source code of the redirected web site,
            without any paywall.
            
            That's maybe a bit insane to automate at the scale of
            archive.today, but I figure they do something along the lines of
            this. It's a perfect imitation of Googlebot because it is literally
            Googlebot.
       
              Aurornis wrote 1 day ago:
              > which I've configured to redirect to a paywalled news article.
              
              Which specific site with a paywall?
       
              jsheard wrote 1 day ago:
              I'd file that under "doesn't know what they're doing" because the
              search console uses a totally different user-agent
              (Google-InspectionTool) and the site is blindly treating it the
              same as Googlebot :P
              
              Presumably they are just matching on *Google* and calling it a
              day.
       
                xurukefi wrote 1 day ago:
                Sure, but maybe there are other ways to control Googlebot in a
                similar fashion. Maybe even with a pristine looking User-Agent
                header.
       
          layer8 wrote 1 day ago:
          Itâs not reliable, in the sense that there are many paywalled sites
          that itâs unable to archive.
       
            tonymet wrote 1 day ago:
            no tool is 100% effective.  Archive.today is the best one we've
            seen
       
            xurukefi wrote 1 day ago:
            But it is reliable in the sense that if it works for a site, then
            it usually never fails.
       
          Aurornis wrote 1 day ago:
          >  I've seen people claiming that they have a bunch of paid accounts
          that they use to fetch the pages, which is, of course, ridiculous.
          
          The curious part is that they allow web scraping arbitrary pages on
          demand. So if a publisher could put in a lot of arbitrary requests to
          archive their own pages and see them all coming from a single account
          or small subset of accounts.
          
          I hope they haven't been stealing cookies from actual users through a
          botnet or something.
       
            coppsilgold wrote 1 day ago:
            You don't even need active measures. If a publisher is serious
            about tracing traitors there are algorithms for that (which are
            used by streamers to trace pirates). It's called "Traitor Tracing"
            in the literature. The idea is to embed watermarks following a
            specific pattern that would point to a traitor or even a coalition
            of traitors acting in concert.
            
            It would be challenging to do with text, but is certainly doable
            with images - and articles contain those.
       
              bawolff wrote 1 day ago:
              You need that sort of thing (i.e. watermarking) when people are
              intentionally trying to hide who did it.
              
              In the archive.today case, it looks pretty automated. Surely just
              adding an html comment would be sufficient.
       
                fc417fc802 wrote 1 day ago:
                If they use paid accounts I would expect them to strip info
                automatically. An "obvious" way to do that is to diff the
                output from two separate accounts on separate hardware
                connecting from separate regions. Streaming services commonly
                employ per-session randomized stenographic watermarks to thwart
                such tactics. Thus we should expect major publishers to do so
                as well.
                
                At which point we still lack a satisfactory answer to the
                question. Just how is archive.today reliably bypassing paywalls
                on short notice? If it's via paid accounts you would expect
                they would burn accounts at an unsustainable rate.
       
                  ouhamouch wrote 1 day ago:
                  Watch [1] they post AT-free recipes for many paywalls
                  
  HTML            [1]: https://news.ycombinator.com/threads?id=1vuio0pswjnm...
       
            xurukefi wrote 1 day ago:
            Exactly. If I was an admin of a popular news website I would try to
            archive some articles and look at the access logs in the backend.
            This cannot be too hard to figure out.
       
          elzbardico wrote 1 day ago:
          > which is, of course, ridiculous.
          
          Why? in the world of web scrapping this is pretty common.
       
            xurukefi wrote 1 day ago:
            Because it works too reliably. Imagine what that would entail.
            Managing thousands of accounts. You would need to ensure to strip
            the account details form archived peages perfectly. Every time the
            website changes its code even slightly you are at risk of losing
            one of your accounts. It would constantly break and would be an
            absolute nightmare to maintain. I've personally never encountered
            such a failure on a paywalled news article. archive.today managed
            to give me a non-paywalled clean version every single time.
            
            Maybe they use accounts for some special sites. But there is
            definetly some automated generic magic happening that manages to
            bypass paywalls of news outlets. Probably something Googlebot
            related, because those websites usually give Google their news
            pages without a paywall, probably for SEO reasons.
       
              wbmva wrote 23 hours 11 min ago:
              Do you know where the doxxed info ultimately originates from? It
              turns out that the archives leaked account names. Try Googling
              what happened to volth on Github.
       
              permo-w wrote 1 day ago:
              I could be wrong, but I think I've seen it fail on more obscure
              sites. But yeah it seems unlikely they're maintaining so many
              premium accounts. On the other hand they could simply be
              state-backed. Let's say there are 1000 likely paywalled sites, 20
              accounts for each = 20k accounts, $10/month => $200k/month =
              $2.4m a year. If I were an intelligence agency I'd happily drop
              that plus costs to own half the archived content on the internet.
              
              Surely it wouldn't be too hard to test. Just set up an unlisted
              dummy paywall site, archive it a few times and see what the
              requests looks like.
       
                Jordan-117 wrote 1 day ago:
                Interesting theory. It would also be a good way to subtly
                undermine the viability of news outlets, not to mention the
                insidious potential of altering snapshots at will. OTOH, I'd
                expect a state-sponsored effort to be more professional in
                terms of not threatening and smearing some blogger who
                questioned them.
       
                  permo-w wrote 3 hours 59 min ago:
                  If I were an intelligence agency wanting to throw people off
                  my scent, maybe I'd set up or pay off a blogger to track down
                  my site's "owner" and then do some immature shit in response
                  to absolutely confirm forever that the blogger was right.
                  
                  Not saying this is true, just saying it could be
       
              behringer wrote 1 day ago:
              Replace any identifiers like usernames and emails with another
              string automatically.
       
              mikkupikku wrote 1 day ago:
              Using two or more accounts could help you automatically strip
              account details.
       
                xurukefi wrote 1 day ago:
                That's actually a really neat idea.
       
          tonymet wrote 1 day ago:
          Iâm an outsider with experience building crawlers.  You can get
          pretty far with residential proxies and browser fingerprint
          optimization.  Most of the b-tier publishers use RBC and heuristics
          that can be âworked aroundâ with moderate effort.
       
            quietsegfault wrote 1 day ago:
            .. but what about subscription only, paywalled sources?
       
              tonymet wrote 1 day ago:
              many publisher's offer "first one's free".
              
              For those that don't , I would guess archive.today is using
              malware to piggyback off of subscriptions.
       
        paganel wrote 1 day ago:
        At this point Archive.today provides a better service (all things
        considered) compared to Wikipedia, at least when it comes to current
        affairs.
       
        tl2do wrote 1 day ago:
        Why not show both? Wikipedia could display archive links alongside
        original sources, clearly labeled so readers  know which is which. This
        preserves access when originals disappear while keeping the primary
        source as the  main reference.
       
          AgentME wrote 1 day ago:
          Wikipedia shouldn't allow links to sites which intentionally falsify
          archived pages and use their visitors to perform DDOS attacks.
       
          ranger207 wrote 1 day ago:
          They generally do. Random example, citation 349 on the page of George
          Washington: ""A Brief History of GW"[link]. GW Libraries.
          Archived[link] from the original on September 14, 2019. Retrieved
          August 19, 2019."
       
            Gander5739 wrote 1 day ago:
            This will always be done unless the original url is marked as dead
            or similar.
       
          bawolff wrote 1 day ago:
          The objection is to this specific archieve service not archiving in
          general.
       
        basch wrote 1 day ago:
        It seems a lot of people havent heard of it, but I think its worth
        plugging [1] which is really the appropriate tool for something like
        Wikipedia to be using to archive pages.
        
        mroe
        
  HTML  [1]: https://perma.cc/
  HTML  [2]: https://en.wikipedia.org/wiki/Perma.cc
       
          Computer0 wrote 1 day ago:
          I switched to Perma.cc earlier this week and have had a mixed
          experience to say the least. I think image heavy pages just error out
          completely, while still charging me such as: [1] and reddit blocks
          their agent seemingly. It is open source though.
          
  HTML    [1]: https://www.in.gov/nircc/planning/highway/traffic-data/inter...
       
          jsheard wrote 1 day ago:
          Does Wikipedia really need to outsource this? They already do
          basically everything else in-house, even running their own CDN on
          bare metal, I'm sure they could spin up an archiver which could be
          implicitly trusted. Bypassing paywalls would be playing with fire
          though.
       
            IshKebab wrote 1 day ago:
            Of course they do. If Wikipedia did it themselves they'd
            immediately get DMCA'd and sued into oblivion.
            
            > Bypassing paywalls would be playing with fire though.
            
            That's the only reason archive.today was used. For non-paywalled
            stuff you can use the wayback machine.
       
            raincole wrote 1 day ago:
            > Does Wikipedia really need to outsource this?
            
            I hope so. Archiving is a legal landmine.
       
            toomuchtodo wrote 1 day ago:
            Archive.org is the archiver, rotted links are replaced by
            Archive.org links with a bot. [1]
            
  HTML      [1]: https://meta.wikimedia.org/wiki/InternetArchiveBot
  HTML      [2]: https://github.com/internetarchive/internetarchivebot
       
              snigsnog wrote 1 day ago:
              Archive.org are left wing activists that will agree to censor
              anything other left wing activists or large companies don't want
              online.
       
                AlexeyBelov wrote 12 hours 30 min ago:
                And you're another disruptive "N days old" account. Troll
                somewhere else.
       
                Maken wrote 1 day ago:
                Like what?
       
                  snigsnog wrote 20 hours 27 min ago:
                  Kiwifarms is an example: [1] Anyone can request anything be
                  removed and they may honor the request: [2] they say nothing
                  about only removing things illegal in the US or anything like
                  that, meaning they can and will remove things based on
                  personal judgements about whether it should be archived.
                  
  HTML            [1]: https://old.reddit.com/r/DataHoarder/comments/x95gd5...
  HTML            [2]: https://help.archive.org/help/how-do-i-request-to-re...
       
              jsheard wrote 1 day ago:
              Yeah for historical links it makes sense to fall back on IAs
              existing archives, but going forward Wikipedia could take their
              own snapshots of cited pages and substitute them in if/when the
              original rots. It would be more reliable than hoping IA grabbed
              it.
       
                toomuchtodo wrote 1 day ago:
                Not opposed, Wikimedia tech folks are very accessible in my
                experience, ask them to make a GET or POST to [1] whenever a
                link is added via the Wiki editing mechanism. Easy peasy.
                Example CLI tools are [2] and [3] Shortcut is to consume the
                Wikimedia changelog firehose and make these http requests
                yourself, performing a CDX lookup request to see if a recent
                snapshot was already taken before issuing a capture request (to
                be polite to the capture worker queue).
                
  HTML          [1]: https://web.archive.org/save
  HTML          [2]: https://github.com/palewire/savepagenow
  HTML          [3]: https://github.com/akamhy/waybackpy
       
                  Gander5739 wrote 1 day ago:
                  This already happens. Every link added to Wikipedia is
                  automatically archived on the wayback machine.
       
                    RupertSalt wrote 1 day ago:
                    [citation needed]
       
                      Gander5739 wrote 1 day ago:
                      Ironic, I know. I couldn't find where I originally heard
                      this years ago, but the InternetArchiveBot page linked
                      above says "InternetArchiveBot monitors every Wikimedia
                      wiki for new outgoing links" which is probably referring
                      to what I said.
       
                  ferngodfather wrote 1 day ago:
                  Why wouldn't Wikipedia just capture and host this themselves?
                  Surely it makes more sense to DIY than to rely on a third
                  party.
       
                    huslage wrote 1 day ago:
                    Why would they need to own the archive at all? The
                    archive.org infrastructure is built to do this work
                    already. It's outside of WMF's remit to internally archive
                    all of the data it has links to.
       
                  RupertSalt wrote 1 day ago:
                  Spammers and pirates just got super excited at that plan!
       
                    toomuchtodo wrote 1 day ago:
                    There are various systems in place to defend against them,
                    I recommend against this, poor form against a public good
                    is not welcome.
       
                  jsheard wrote 1 day ago:
                  I didn't know you can just ask IA to grab a page before their
                  crawler gets to it. In that case yeah it would make sense for
                  Wikipedia to ping them automatically.
       
                    extraduder_ire wrote 1 day ago:
                    There's a /save/ endpoint that archives the page you point
                    it at.
                    
                    You can see a text box for it on the right, if you go on
                    the waybackmachine's homepage. I used it yesterday.
       
          ronsor wrote 1 day ago:
          It costs money beyond 10 links, which means either a paid
          subscription or institutional affiliation. This is problematic for an
          encyclopedia anyone can edit, like Wikipedia.
       
            extraduder_ire wrote 1 day ago:
            This is assuming they can't work out something with wikipedia to
            offer it for free (via a wikiforge tool, or bot) in exchange for
            the exposure of being the most common archive provider/putting a
            "used by Wikimedia" logo on their website.
            
            The major reason archive.today was being used is that it also
            bypassed paywalls, and I don't think perma.cc does that normally.
       
            toomuchtodo wrote 1 day ago:
            Wikimedia could pay, they have an endowment of ~$144M [1] (as of
            June 30, 2024). Perma.cc has Archive.org and Cloudflare as
            supporting partners, and their mission is aligned with Wikimedia
            [2]. It is a natural complementary fit in the preservation
            ecosystem. You have to pay for DOIs too, for comparison [3]
            (starting at $275/year and $1/identifier [4] [5]).
            
            With all of this context shared, the Internet Archive is likely
            meeting this need without issue, to the best of my knowledge. [1]
            [2]  ("Perma.cc was built by Harvardâs Library Innovation Lab and
            is backed by the power of libraries. Weâre both in the forever
            business: libraries already look after physical and digital
            materials â now we can do the same for links.") [3] [4] [4] [5]
            [5] 
            
            (no affiliation with any entity in scope for this thread)
            
  HTML      [1]: https://meta.wikimedia.org/wiki/Wikimedia_Endowment
  HTML      [2]: https://perma.cc/about
  HTML      [3]: https://community.crossref.org/t/how-to-get-doi-for-our-jo...
  HTML      [4]: https://www.crossref.org/fees/#annual-membership-fees
  HTML      [5]: https://www.crossref.org/fees/#content-registration-fees
       
              bawolff wrote 1 day ago:
              > Organizations that do not qualify for free usage can contact
              our team to learn about creating a subscription for providing
              Perma.cc to their users. Pricing is based on the number of users
              in an organization and the expected volume of link creation.
              
              If pricing is so much that you have to have a call with the
              marketing team to get a quote, i think it would be a poor use of
              WMF funds.
              
              Especially because volume of links and number of users that
              wikimedia would entail is probably double their entire existing
              userbase at least.
              
              Ultimately we are mostly talking about a largely static web host.
              With legal issues being perhaps the biggest concern. It would
              probably make more sense for WMF to create their own than to
              become a perma.cc subscriber.
              
              However for the most part, partnering with archive.org seems to
              be going well and already has some software integration with
              wikipedia.
       
              RupertSalt wrote 1 day ago:
              If the WMF had a dollar for every proposal to spend
              Endowment-derived funds, their Endowment would double and they
              could hire one additional grant-writer
       
                Dylan16807 wrote 1 day ago:
                Do you have experience with this?  I'd like to hear more,
                really.  I think this is the first time I've seen a suggestion
                for something new they can spend money on.  I usually just see
                talk about where to spend less.
       
                nine_k wrote 1 day ago:
                If the endowment is invested so that it brings very
                conservative 3% a year, it means that it brings $4.32M a year.
                By doubling that, rather many grant writers could be hired.
       
                  erk__ wrote 1 day ago:
                  Well the last annual report I could find actually says that
                  they got a return of 17.65% so 3% would be pretty bad
                  
  HTML            [1]: https://wikimediaendowment.org/annualreports/2023-20...
       
        RupertSalt wrote 1 day ago:
        "Non-paywalled" ad-free link to archive:
        
  HTML  [1]: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/A...
       
        shevy-java wrote 1 day ago:
        Anyone has a short summary as to who and why Archive.today acted via
        DDos? Isn't that something done by malicious actors? Or did others
        misuse Archive.today?
       
          zeroonetwothree wrote 1 day ago:
          If you read the linked article it is discussed
       
        celsoazevedo wrote 1 day ago:
        I don't see the point in doxing anyone, especially those providing a
        useful service for the average internet user. Just because you can put
        some info together, it doesn't mean you should.
        
        With this said, I also disagree with turning everyone that uses
        archive[.]today into a botnet that DDoS sites. Changing the content of
        archived pages also raises questions about the authenticity of what
        we're reading.
        
        The site behaves as if it was infected by some malware and the archived
        pages can't be trusted. I can see why Wikipedia made this decision.
       
          luxuryballs wrote 1 day ago:
          this seems like type of thing that should be on blockchain and
          decentralized nodes validate authenticity, it could support revisions
          but not lose originals
       
          Sophira wrote 1 day ago:
          Sites that exist to archive other websites will almost always need to
          dynamically change the content of the HTML that they're serving in
          some way or another. (For example, a link that points to the root of
          the website may need changed in order to point to the right
          location.)
          
          So it doesn't necessarily raise questions about whether the content
          has been changed or not. The difference is in whether that change is
          there to make the archive usable - and of course, for archive.today,
          that's not the case.
       
          fluoridation wrote 1 day ago:
          For a very brief time, "doxing" (that is, dropping dox, that is,
          dropping docs, or documents) used to mean something useful. You
          gathered information that was not out in public, for example by
          talking to people or by stealing it, and put it out in the open.
          
          It's very silly to talk about doxing when all someone has done is
          gather information anyone else can equally easily obtain, just given
          enough patience and time, especially when it's information the person
          in question put out there themselves. If it doesn't take any special
          skills or connections to obtain the information, but only the
          inclination to actually perform the research on publicly available
          data, I don't see what has been done that is unethical.
       
            noobermin wrote 1 day ago:
            Did you read the article? They dug deep, they didn't just do a
            google search and leave it at that. They drew links between deleted
            posts and defunct accounts, they compared profile pictures of
            anonymous profiles.
            
            I'm not defending the archive.today webmaster but it's
            unfortunately understandable they are angry. Saying what the
            blogger did was merely point out public information is a gross
            oversimplification.
       
              fluoridation wrote 1 day ago:
              Did you read the comment you're replying to? They didn't use any
              information not publicly available.
       
                noobermin wrote 18 hours 35 min ago:
                That is NOT the line for doxxing at all, I don't know why you
                hang your argument on that aspect. Even institutions that care
                about secrecy like governments state that documents that
                aggregate ostensibly public information can raise the
                classification level of a document above being non-classified.
                The reasons for this are obvious, essentially aggregated
                information can lead one to draw conclusions that otherwise are
                not obvious. That is akin to what the original article by
                Gyrovague does.
       
                  fluoridation wrote 17 hours 33 min ago:
                  >That is NOT the line for doxxing
                  
                  Again, did you read my comment? I know what it means now. My
                  point is about highlighting the change in meaning, not about
                  obstinately denying what the word means.
                  
                  >Even institutions that care about secrecy like governments
                  state [...]
                  
                  A given organization can have whatever policy it wants with
                  regards to which documents it wants to allow to be made
                  public. It could make all documents printed on non-yellow
                  paper classified. That has nothing to do with the ethics of
                  doxing.
                  
                  >The reasons for this are obvious, essentially aggregated
                  information can lead one to draw conclusions that otherwise
                  are not obvious.
                  
                  A secret is not something that's not obvious, it's a datum
                  that's strictly controlled by the people who know it. If I
                  can find some information about your real identity just by
                  searching for it online then it's not a secret; you don't
                  control that piece of information. You've given up that
                  control by divulging the information in a public space where
                  information often remains indefinitely.
       
            lelandbatey wrote 1 day ago:
            Eh, you can find in public data things like "what is someone's
            address" based only on their name by looking up public records of
            mortgage records. That however is quite bad form, and if you did do
            that, I think it would be pretty unethical.
       
            bawolff wrote 1 day ago:
            Call it stalking or harrasment if you prefer. Regardless its rude
            (sometimes illegal) behaviour.
            
            That's no justification for using visitors to your site to do a
            DDOS.
            
            In the slang of reddit: ESH
       
              wolvoleo wrote 1 day ago:
              In this case archive.today has a lot of influence over the
              information we take in because of the rise in paywalls. They have
              the potential of modifying the news we absorb at scale.
              
              In that context I don't think the question ("actually, who is
              providing all this information to me and what interests drive
              them") is one that's misplaced. Maybe we shouldn't look into a
              gift horse's mouth but don't forget this could be a Trojan horse
              as well.
              
              The article brought to light some ties to Russia but probably not
              ties to its government and its troll farms. Rather an independent
              and pretty rebellious citizen. That's good to hear. And that's
              valuable information. I trust the site more after reading the
              article, not less.
              
              The article could have redacted the names they found but they
              were found with public sources and these sources validate the
              encountered information (otherwise the results could have been
              dismissed)
       
              fluoridation wrote 1 day ago:
              It's neither of those. Stalking refers to persistent, unwanted,
              one-sided interactions with a person such as following,
              surveilling, calling, or sending messages or gifts. Investigating
              a person's past or identity doesn't involve any interaction with
              the physical person. Harassment is persistent attempts to
              interact with someone after having been asked to stop. Again, an
              investigation doesn't require any form of interaction.
       
                JoshTriplett wrote 1 day ago:
                > Harassment is persistent attempts to interact with someone
                
                No, harassment also includes persistent attempts to cause
                someone grief, whether or not they involve direct interactions
                with that person.
                
                From Wikipedia:
                
                > Harassment covers a wide range of behaviors of an offensive
                nature. It is commonly understood as behavior that demeans,
                humiliates, and intimidates a person.
       
                  fluoridation wrote 1 day ago:
                  Doxing in the loose sense could be harassment in certain
                  circumstances, such as if you broadcast a person's home
                  address to an audience with the intent to cause that audience
                  to use that address, even if the address was already out
                  there. In that case, the problem is not the release of
                  information, but the intent you're communicating with the
                  release. It would be the same if you told that audience "you
                  know guys? It's not very difficult to find jdoe's home
                  address if you google his name. I'm not saying anything, I'm
                  just saying." Merely de-pseudonymizing a screen name may or
                  may not be harassment. Divulging that jdoe's real name is
                  John Doe would not have the same implications as if his name
                  was, say, Keanu Reeves.
                  
                  Because the two are distinct, one can't simply replace
                  "doxing" with "harassment".
       
                    JoshTriplett wrote 1 day ago:
                    Generally speaking, every case I've seen of people using
                    the term "doxing" tends to be for the case that
                    specifically is harassment; it has the connotation of using
                    the information, precisely because if you aren't intending
                    to use it there's no good reason for you to have it.
       
                      fluoridation wrote 1 day ago:
                      That's just another way the term is used incorrectly.
       
                        JoshTriplett wrote 1 day ago:
                        Language evolves. Connotation tends to become
                        definition. Not always the only definition, but
                        connotation becomes the "especially" or the "definition
                        2", and can become the primary definition over time.
       
                          allarm wrote 10 hours 30 min ago:
                          > Language evolves
                          
                          That's just another way of saying "words don't have
                          meanings". Yes, it evolves, but to preserve the
                          original meanings, that evolution should be slowed
                          down as much as possible to avoid âblack is
                          whiteâ effects.
       
                          fluoridation wrote 1 day ago:
                          That's not what I mean. If we agree that harassment
                          is wrong and that doxing is not harassment (because
                          not all doxing is harassment), then it's incorrect to
                          say that doxing is wrong. For example, the article
                          from the blog, even if we agree that it is doxing,
                          isn't harassment. The person being discussed is
                          presented in a positive light:
                          
                          >I for one will be buying Denis/Masha/whoever a well
                          deserved cup of coffee.
                          
                          Using one term when what is meant is actually the
                          other serves nothing but to sow confusion.
       
                            bawolff wrote 1 day ago:
                            You can harass someone while discussing them in a
                            positive light.
                            
                            And i don't just mean under colloquial definition,
                            i mean under the legal definition of harrasment. In
                            fact its fairly common for unwanted "positive"
                            attention to be harrasment - e.g. unwanted sexual
                            advances mostly fit that description.
       
                              fluoridation wrote 1 day ago:
                              You are generalizing an irrelevant point. What I
                              was getting at is that unlike the usual usage of
                              doxing, it was not a call to go bother that
                              person. I didn't think I needed to make that
                              point this explicitly within the context of this
                              subthread.
       
                                bawolff wrote 1 day ago:
                                Which is irrelavent as that is not a
                                requirement for it to be harrasment.
                                
                                I get that a call to action is a common feature
                                of doxing and it wasn't present here, but its
                                not a particularly common feature of harrasment
                                outside of the context of doxing and nothing in
                                the definition of harrasment requires it.
       
                            grimgrin wrote 1 day ago:
                            update the etymology then on wikipedia with your
                            reference
                            
                            that current etymology is what weâre all talking
                            about obv
       
          cardanome wrote 1 day ago:
          As far as I understand the person behind archive.today might face
          jail time if they are found out. You shouldn't be surprised that
          people lash out when you threaten their life.
          
          I don't think the DDOSing is a very good method for fighting back but
          I can't blame anyone for trying to survive. They are definitely the
          victim here.
          
          If that blog really doxxed them out of idle curiosity they are an
          absolute piece of shit. Though I think this is more of a targeted
          campaign.
       
            pibaker wrote 1 day ago:
            One thing they always teach you in Crime University is "don't break
            two laws at the same time." If you have contrabands in your car,
            don't speed or run red lights, because it brings attention and
            attentions means jail.
            
            In this case, I didn't know that the archive.today people were
            doxxed until they started the ddos campaign and caught attention. I
            doubt anyone in this thread knew or cared about the blogger until
            he was attacked. And now this entire thing is a matter of permanent
            record on Wikipedia and in the news. archive.today's attempt at
            silencing the blogger is only bringing them more trouble, not less.
            
            Barbara_Streisand_Mansion.jpg
       
              stuffoverflow wrote 1 day ago:
              The weird thing is that there was nothing new in that blog post.
              And on top of that it couldn't conclusively say who the owner of
              archive.today is, so no one still knows.
       
              ouhamouch wrote 1 day ago:
              We do not know what was important in that doxx.
              
              Probably nothing and the DDoS hype was intentional to distract
              attention and highlight J.P.'s doxx among the other, making them
              insignificant.
              
              J.P. might be the only one of the doxxers who could promote their
              doxx in media, and this made his doxx special, not the content?
              
              Anyway, it made the haystack bigger keeping needle the same.
       
            protimewaster wrote 1 day ago:
            > As far as I understand the person behind archive.today might face
            jail time if they are found out. You shouldn't be surprised that
            people lash out when you threaten their life.
            
            One of the really strange things about all of this is that there is
            a public forum post in which a guy claims to be the site owner. So
            this whole debacle is this weird mix of people who are angry and
            saying "clearly the owner doesn't want to be associated with the
            site" on the one hand, but then on the other hand there's literally
            a guy who says he's the one that owns the site, so it doesn't seem
            like that guy is very worried about being associated with it?
            
            It also seems weird to me that it's viewed as inappropriate to
            report on the results of Googling the guy who said he owns the
            site, but maybe I'm just out of touch on that topic.
       
              arboles wrote 1 day ago:
              > is that there is a public forum post in which a guy claims to
              be the site owner.
              
              Which forum post? The post mentioned by the blogger, the post on
              an F-Secure forum (a company with cybersecurity products) was a
              request for support by the owner of archive.today regarding a
              block of their site. It's arguably not intended as a public
              statement by the owner of the archive, and they were simply
              careless with their username.
       
              RobotToaster wrote 1 day ago:
              I don't see how that contradicts anything?  He's almost certainly
              using a nomme de guerre.
       
              ouhamouch wrote 1 day ago:
              There are even YouTube videos (of GamerGate-time, thus before AI
              era) with a guy claiming to be the site owner. A bit difficult to
              OSINT :)
       
            luckylion wrote 1 day ago:
            Somebody who a) directs DDOS attacks and b) abuses random visitors'
            browser for those DDOS attacks is never the victim.
            
            You don't know their motives for running their site, but you do get
            a clear message about their character by observing their actions,
            and you'd do well to listen to that message.
       
              cardanome wrote 1 day ago:
              The character is completely irrelevant to whether they are a
              victim of doxxing.
              
              They might be the worst person ever but that doesn't matter.
              People can be good and bad, sometimes the victim sometimes the
              perpetrator.
              
              Is it morally wrong to doxx someone and cause them to go to jail
              because they are running an archive website? Yes. It is. It
              doesn't matter who the person is. It does not matter what their
              motivations are.
       
                darkwater wrote 1 day ago:
                So, we are back at eye for eye and tooth for tooth?
       
                  cardanome wrote 1 day ago:
                  No. I literally said
                  
                  > I don't think the DDOSing is a very good method for
                  fighting back
                  
                  I am really shocked by the conditional empathy people here
                  are showing. The doxxing isn't less bad just because the
                  reaction to it is bad.
                  
                  Its like justifying bullying because the person "deserves"
                  it.
       
                fc417fc802 wrote 1 day ago:
                Irrelevant to a determination of fact, yes. But very relevant
                to the question of whether or not I care about any of this. Bad
                thing happened to bad person, lots of drama ensued, come
                rubberneck the various internet slapfights, details at 11. In
                other news, water is wet.
       
                AgentME wrote 1 day ago:
                There are plenty of cases where the operator of archive.today
                refused to take down archives of pages with people's
                identifying information, so it's a huge double standard for
                them to insist on others to not look into their identity using
                public information.
       
          ddtaylor wrote 1 day ago:
          Did they actually run the DDoS via a script or was this a case of
          inserting a link and many users clicked it? They are substantially
          different IMO
       
            hexagonwin wrote 1 day ago:
            they silently ran the DDoS script on their captcha page (which is
            frequently shown to visitors, even when simply viewing and not
            archiving a new page)
       
            dunder_cat wrote 1 day ago:
             [1] has the earliest writeup that I know of. It was running it via
            a script and intentionally using cache busting techniques to try to
            increase load on the hosted wordpress infrastructure.
            
  HTML      [1]: https://news.ycombinator.com/item?id=46624740
       
              RobotToaster wrote 1 day ago:
              Given the site is hosted on wordpress.com, who don't charge for
              bandwidth, it seems to have been completely ineffective.
       
                Hamuko wrote 1 day ago:
                The speculation that I saw was that they'd try to get
                Wordpress.com to boot him off for being a burden on the overall
                infrastructure.
       
                  ouhamouch wrote 1 day ago:
                  AT answered why the DDoS and why it is still active
                  
  HTML            [1]: https://lj.rossia.org/users/archive_today/2478.html
       
                    ddtaylor wrote 1 day ago:
                    Seems like they just Streisand Effect themselves and
                    amplify the message of the "attacker"
       
                    viraptor wrote 1 day ago:
                    This is an impressively unhinged take. I still have no idea
                    what the person is trying to achieve. And I'm sad we're
                    likely going to lose that resource in the future.
       
                      noobermin wrote 1 day ago:
                      I understand being mad but no, unfortunately, despite me
                      knowing humans are human and they get angry at times,
                      this response does still leave a bitter taste in the
                      mouth and many people will perceive it that way. Changing
                      the content of the archived pages is the worst thing
                      they've done honestly. The "3 Hz DDoS" is funny perhaps
                      but then if it's so harmless, then why even bother? But
                      regardless, tampering with the archives, that is,
                      tainting the content that people appreciate you for won't
                      sit well with people.
                      
                      I don't know, I feel like everyone loses here.
       
                      walletdrainer wrote 1 day ago:
                      People are now also talking about the weirdo trying to
                      dox him instead of just the operator of the website,
                      doesnât seem like an unreasonable goal.
       
                        viraptor wrote 23 hours 10 min ago:
                        We're taking about both now, at least one a week it
                        seems. Without the DDoS, we'd mostly forget about the
                        blog. I didn't even know about the blog until the DDoS
                        started.
       
                  chrisjj wrote 1 day ago:
                  As if Wordpress.com was that dumb...
       
                    RobotToaster wrote 1 day ago:
                    Mullenweg is dumb, but he seems like the kind of dumb that
                    would try to launch his own attack on archive.today rather
                    than remove the site.
                    
                    (For those who don't know, he's currently trying to destroy
                    one of the largest WP hosting providers with a bunch of
                    lawsuits)
       
                    daedrdev wrote 1 day ago:
                    Are you kidding, it's wordpress
       
              jsheard wrote 1 day ago:
              > It was running
              
              It still is, uBlocks default lists are killing the script now but
              if it's allowed to load then it still tries to hammer the other
              blog.
       
                dunder_cat wrote 1 day ago:
                Ah good to know. My pi-hole actually was blocking the blog
                itself since the ublock site list made its way into one of the
                blocklists I use. But I've been just avoiding links as much as
                possible because I didn't want to contribute.
       
              ddtaylor wrote 1 day ago:
              Thank you this is exactly the information I was looking for.
              
              "You found the smoking gun!"
       
          jsheard wrote 1 day ago:
          It's also kind of ironic that a site whose whole premise is to
          preserve pages forever, whether the people involved like it or not,
          is seeking to take down another site because they are involved and
          don't like it. Live by the sword, etc.
       
            palmotea wrote 1 day ago:
            > It's also kind of ironic that a site whose whole premise is to
            preserve pages forever, whether the people involved like it or not
            
            Oddly, I think archive.today has explicitly said that's not what
            they're there for, and the people shouldn't rely on their links as
            a long-term archive.
       
              eviks wrote 1 day ago:
              Where have they said it?
              
              > Archive.today is a time capsule for web pages!
              > It takes a 'snapshot' of a webpage that will always be online
              even if the original page disappears.
       
                palmotea wrote 1 day ago:
                This reddit post collects some statements:
                
  HTML          [1]: https://old.reddit.com/r/DataHoarder/comments/1i277vt/...
       
              johanyc wrote 1 day ago:
              What are they for then
       
                palmotea wrote 1 day ago:
                Bypassing paywalls? It actually seems like they've got accounts
                at many paywalled sites. Shorter term archiving?
                
                Given the unclear ownership situation, it makes sense not to
                rely on them for anything long term. They could disappear
                tomorrow.
       
          jMyles wrote 1 day ago:
          > Changing the content of archived pages also raises questions about
          the authenticity of what we're reading.
          
          This is absolutely the buried lede of this whole saga, and needs to
          be the focus of conversation in the coming age.
       
        mrguyorama wrote 1 day ago:
        >In emails sent to Patokallio after the DDoS began, âNoraâ from
        Archive.today threatened to create a public association between
        Patokallioâs name and AI porn and to create a gay dating app with
        Patokallioâs name.
        
        Oh good. That's definitely a reasonable thing to do or think.
        
        The raw sociopathy of some people. Getting doxxed isn't good, but this
        response is unhinged.
       
          oytis wrote 1 day ago:
          I mean, the admin of archive.today might face jail time if
          deanonymised, kind of understandable he's nervous. Meanwhile for
          Patokallio it's just curiosity and clicks
       
          jMyles wrote 1 day ago:
          It's a reminder how fragile and tenuous are the connections between
          our browser/client outlays, our societal perceptions of online norms,
          and our laws.
          
          We live at a moment where it's trivially easy to frame possession of
          an unsavory (or even illegal) number on another person's storage
          media, without that person even realizing (and possibly, with some
          WebRTC craftiness and social engineering, even get them to pass on
          the taboo payload to others).
       
          ouhamouch wrote 1 day ago:
          That was private negotiations, btw, not public statements.
          
          In response to J.P's blog already framed AT as project grown from a
          carding forum + pushed his speculations onto ArsTechnica, whose
          parent company just destroyed 12ft and is on to a new victim. The
          story is full of untold conflicts of interests covered with soap
          opera around DDoS.
       
            MBCook wrote 1 day ago:
            Why does it matter it was a private communications?
            
            Itâs still a threat isnât it?
       
            Yossarrian22 wrote 1 day ago:
            Can you elaborate on your point?
       
              ouhamouch wrote 1 day ago:
              The fight is not about where it is shown and not about what, not
              about "links in Wikipedia", but about whether News Inc will be
              able to kill AT, as they did with 12FT.
       
                Yossarrian22 wrote 1 day ago:
                What is News Inc? Are they a funder of Wikipedia(I think
                Wikipedia didnât have a parent company so theyâre not
                owners)?
       
                  ouhamouch wrote 1 day ago:
                  They are owner of ArsTechnica which wrote 3rd (or 4th?)
                  article on AT in a row painting it in certain colors.
                  
                  The article about FBI subpoena that pulled J.P's speculations
                  out of the closet was also in ArsTechnica and by the same
                  author, and that same article explicitly mentioned how they
                  are happy with 12ft down
       
                    Yossarrian22 wrote 1 day ago:
                    â¦ Ars is owned by Conde Nast?
       
                      ouhamouch wrote 1 day ago:
                      from the Ars article:
                      
                      ---
                      US publishers have been fighting web services designed to
                      bypass paywalls. In July, the News/Media Alliance said it
                      secured the takedown of paywall-bypass website 12ft.io.
                      âFollowing the News/Media Allianceâs efforts, the
                      webhost promptly locked 12ft.io on Monday, July 14th,â
                      the group said. (Ars Technica owner CondÃ© Nast is a
                      member of the alliance.)
                      ---
       
        alsetmusic wrote 1 day ago:
        I will no longer donate to Wikipedia as long as this is policy.
       
          Larrikin wrote 1 day ago:
          About how much had you previously donated over the years?
       
          jraph wrote 1 day ago:
          Why? The decision seems reasonable at first sight.
       
            chrisjj wrote 1 day ago:
            Second sight is advisable in such cases. Fact is, archives are
            essential to WP integrity and there's no credible alternative to
            this one.
            
            I see WP is not proposing to run its own.
       
              prmoustache wrote 1 day ago:
              > there's no credible alternative to this one.
              
              But this one is not credible either so...
       
              huslage wrote 1 day ago:
              What exactly is credible about archive.today if they are willing
              to change the archive to meet some desire of the leadership?
              That's not credible in the least.
       
                chrisjj wrote 1 day ago:
                A lot more credible than archive.org that lets archives be
                changed and deleted by the archive targets.
                
                What's your better idea?
       
                  josephcsible wrote 1 day ago:
                  Does archive.org really let its archives be changed? That's
                  very different than letting them be deleted from a
                  credibility perspective.
       
                    ouhamouch wrote 1 day ago:
                    Yes.
                    
                    Archive.org snapshots may load javascript from external
                    sites, where the original page had loaded them. That script
                    can change anything on the page. Most often, the domain is
                    expired and hijacked by a parking company, so it just
                    replaces the whole page with ads.
                    
                    Example: [1] ----
                    
                    And another example: [2] The page "got changed" every
                    second. It is easy to make an archived page which would
                    show different content depending on current time or whether
                    you have Mac or Windows, or your locale, or browser
                    fingerpring, or been tailored for you personally
                    
  HTML              [1]: https://web.archive.org/web/20140701040026/http://...
  HTML              [2]: https://web.archive.org/web/20260219005158/https:/...
       
                      josephcsible wrote 1 day ago:
                      I don't think it's fair to equate running JS that can
                      change the rendered output with the archive server
                      actually changing the HTML it sends back.
       
                        ouhamouch wrote 1 day ago:
                        I agree, JS is much worse. Because anyone could create
                        an "untrustworthy" page on archive.org, no hack or
                        admin assistance is required.
       
                          chrisjj wrote 1 day ago:
                          Much worse indeed. This's why one should be deeply
                          sceptical of the handful of WP users seeking to
                          replace archive.today by archive.org. AT allows
                          tampering by the archive operator; IA allows
                          tampering by half the planet... including WP editors
                          who'd love that replacement.
       
                  RupertSalt wrote 1 day ago:
                  > the archive targets
                  
                  Isn't there a substantial overlap with the copyright holders?
       
                    chrisjj wrote 1 day ago:
                    Overlap?
       
              Jordan-117 wrote 1 day ago:
              Did you not read the article? They not only directed a DDOS
              against a blogger who crossed them, but altered their own
              archived snapshots to amplify a smear against them. That
              completely destroys their trustworthiness and credibility as a
              source of truth.
       
                ouhamouch wrote 1 day ago:
                Altered snapshots = hide Nora name?
                
                ArsTechica just did the same - removed Nora from older
                articles. How can you trust ArsTechica after that?
       
                  Jordan-117 wrote 1 day ago:
                  They didn't just remove her name, but replaced it with the
                  target's name.
                  
                  I don't know what you're talking about re: Ars removing her
                  name from old articles.
       
                    Jordan-117 wrote 1 day ago:
                    Follow-up: maybe you're confusing Ars Technica with
                    Wikipedia, whose admins did redact Nora's last name from
                    discussions? If so, that's a weird equivalence to draw,
                    since the change was disclosed and done to protect personal
                    information, not attack someone else in the process. (Also,
                    "Nora [redacted]" itself seems to be a name lifted from an
                    unrelated person who had merely contacted Archive.today
                    with a takedown request.)
       
                      Smartchat wrote 1 day ago:
                      1. I can't post links (I've already tried), my comments
                      with links are getting shadowbanned. Check out Jon
                      Brodkin's article on Ars about AT, not today's, but the
                      previous one, 6 days ago. Nora's name was there, but now
                      it's silently gone.
                      
                      2. We learned about Nora's involvement from Patokallio.
                      We learned about Nora's non-involvement... also from
                      Patokallio. They could have reached a settlement with AT
                      that includes hiding Nora's name.
                      
                      3. Regardless of who Nora is, it is interesting to see
                      the extent of this censorship: so far only gyrovague.com
                      and arstechnica.com, but not tomshardware.com and not
                      tech.yahoo.com. This shows which sites are working
                      closely with the AT defamation campaign, and which are
                      simply copywriting the news feed.
       
                        Jordan-117 wrote 1 day ago:
                        Silently? It tells you right there in the article:
                        "Nora [last name redacted]". Maybe they could add a
                        more fulsome explanation in an editor's note but it
                        seems pretty obvious in context.
                        
                        If AT is appropriating some random person's name as an
                        alias, it seems helpful to report on that publicly in
                        order to expose the practice and help clear up the
                        misinformation.
       
                          Smartchat wrote 1 day ago:
                          Silently. Last article. Not today's.
                          
                          One with title 'Archive.today CAPTCHA page executes
                          DDoS; Wikipedia considers banning site'
                          
                          I'll try to add the link with comment edit:
                          
                          This has Nora's name [1] The current version has not
                          
  HTML                    [1]: https://web.archive.org/web/20260210195502/h...
       
                            Jordan-117 wrote 1 day ago:
                            Even if they did, so what? There's nothing wrong
                            with a news article removing personal information
                            as a precaution. It's light-years away from
                            altering the content of an archival snapshot in
                            order to target someone else.
       
                              Smartchat wrote 1 day ago:
                              Well, that's the only name they removed, even
                              though it didn't stand out among the other names
                              in the investigation. Secondly, it's ironic to do
                              so in an article tagged "Streisand Effect" so
                              perhaps we're witnessing part of the performance.
                              And thirdly, it's strange to blame AT for
                              removing... the same name, and not blame Ars.
                              Immediately accusing... AT of double standards
                              and hypocrisy.
                              
                              I am lost here. It is definitively an organized
                              defamation campaign.
                              
                              âYou are guilty simply because I am hungryâ
       
                                Jordan-117 wrote 1 day ago:
                                Seems more like Ars trying to avoid piling more
                                attention on the name of a person that isn't
                                actually involved.
                                
                                And again, the accusation against Archive.today
                                isn't just that they removed their "Nora" alias
                                from a snapshot, but that they replaced it with
                                the name of the blogger they were quarreling
                                with. There's no defensible reason to do that
                                outside of petty revenge (which tracks with the
                                emails and public statements from the
                                Archive.today maintainer).
       
                                  Smartchat wrote 1 day ago:
                                  > Ars trying to avoid piling more attention
                                  on the name of a person that isn't actually
                                  involved.
                                  
                                  Oh, yes, by removing the name in the context
                                  of "Streisand Effect".
                                  
                                  > petty revenge
                                  
                                  How does it "revenge"? Was it a porn page? Or
                                  something bad?
                                  
                                  It is likely to be just a funny placeholder
                                  name of the same length to come in mind.
                                  
                                  --
                                  
                                  We could find good and bad motives for both
                                  AT and Ars.
                                  
                                  The bias against AT was here apriori.
                                  Paywall-story for CondeNast, russophobia for
                                  the rest.
       
                                    Jordan-117 wrote 1 day ago:
                                    They apparently did a find + replace across
                                    their database to change the Nora alias to
                                    the blogger's name. So any archives of
                                    content referencing her would instead point
                                    to him, muddying the waters and blaming him
                                    for anything she was accused of. Like I
                                    said, petty.
                                    
                                    The porn smear threats came later, via
                                    email.
       
                chrisjj wrote 1 day ago:
                Sure I read it. But I don't believe everything I read on the
                internet.
       
                  creatonez wrote 1 day ago:
                  The proof is right there for you to see. Denying it is rather
                  wacky.
       
              throw0101a wrote 1 day ago:
              > Fact is, archives are essential to WP integrity and there's no
              credible alternative to this one.
              
              Yes, they are essentional, and that was the main reason for not
              blacklisting Archive.today. But Archive.today has shown they do
              not actually provide such a service:
              
              > âIf this is true it essentially forces our hand,
              archive.today would have to go,â another editor replied. âThe
              argument for allowing it has been verifiability, but that of
              course rests upon the fact the archives are accurate, and the
              counter to people saying the website cannot be trusted for that
              has been that there is no record of archived websites themselves
              being tampered with. If that is no longer the case then the
              stated reason for the website being reliable for accurate
              snapshots of sources would no longer be valid.â
              
              How can you trust that the page that Archive.today serves you is
              an actual archive at this point?
       
                chrisjj wrote 1 day ago:
                > If ... If ...
                
                Oh dear.
                
                > How can you trust that the page that Archive.today serves you
                is an actual archive at this point?
                
                Because no-one shown evidence that it isn't.
       
                  rufo wrote 1 day ago:
                  The quote uses ifs because it was written before this was
                  verified, but the Wikipedia thread in question has links to
                  evidence of tampering occurring.
       
                    chrisjj wrote 1 day ago:
                    Lets see them, then.
       
                      kay_o wrote 1 day ago:
                      They referring to [1] ?
                      
  HTML                [1]: https://en.wikipedia.org/wiki/Wikipedia:Requests...
       
                        chrisjj wrote 1 day ago:
                        > They referring to [1] ... ?
                        
                        Wikipedia does not have a project page with this exact
                        name.
                        
                        I assume that is weasel words for 404 Not Found.
                        
  HTML                  [1]: https://en.wikipedia.org/wiki/Wikipedia:Reques...
       
                          Gander5739 wrote 1 day ago:
                          You seem to have truncated the link; it appears in
                          full for me in kay_o's comment.
       
                            chrisjj wrote 1 day ago:
                            I did not. The link was susequently edited.
                            
                            To [1] I read that up to the first "proof", [2] It
                            lands "503 Service Unavailable
                            No server is available to handle this request."
                            
  HTML                      [1]: https://en.wikipedia.org/wiki/Wikipedia:Re...
  HTML                      [2]: https://web.archive.org/web/20260218135501...
       
                              Gander5739 wrote 1 day ago:
                              Apologies, then. The Wayback link works just fine
                              for me, no errors.
       
              that_lurker wrote 1 day ago:
              The operators() of archive.today (and the other domains) are
              doing shadey things and the links are not working so why keep the
              site around as for example Internet archives waybackmachine works
              as alternative to it.
       
                snigsnog wrote 1 day ago:
                No it doesn't. You can just request content be removed from
                Archive.org and they will honor this: [1] Nonstarter for
                anything that you actually want to be preserved, especially
                anything controversial.
                
  HTML          [1]: https://help.archive.org/help/how-do-i-request-to-remo...
       
                  chrisjj wrote 1 day ago:
                  No request is needed. Just robots.txt to deliver a bulk
                  removal.
       
                chrisjj wrote 1 day ago:
                What archive.today links are not working?
                
                > Internet archives wayback machine works as alternative to it.
                
                It is appalling insecure. It lets archives be altered by page
                JS and deleted by the page domain owner.
       
                  that_lurker wrote 1 day ago:
                  Currently as far as I know at least both archive.today and
                  archive.is have the same ddos code on the main page.
                  For more details
                  
  HTML            [1]: https://gyrovague.com/2026/02/01/archive-today-is-di...
       
                    chrisjj wrote 1 day ago:
                    Is that what you call "not working"?
       
              mook wrote 1 day ago:
              Wouldn't it be precisely because archives are important that
              using something known to modify the contents would be avoided?
       
                chrisjj wrote 1 day ago:
                Obviously not, since archive.org is encouraged.
       
                esseph wrote 1 day ago:
                > something known to modify the contents would be avoided?
                
                Like Wikipedia?
       
                  beej71 wrote 1 day ago:
                  No, not like that. There's a difference between a site that:
                  
                  1) provides a snapshot of another site for archival purposes.
                  2) provides original content.
                  
                  You're arguing that since encyclopedias change their content,
                  the Library of Congress should be allowed to change the
                  content of the materials in its stacks.
                  
                  By modifying its archives, archive.today just flushed its
                  credibility as an archival site. So what is it now?
       
                    esseph wrote 1 day ago:
                    > You're arguing that since encyclopedias change their
                    content, the Library of Congress should be allowed to
                    change the content of the materials in its stacks.
                    
                    As an end user of Wikipedia there are occasions where
                    content has been scrubbed and/or edits hidden. Admins can
                    see some of those, but end users cannot (with various
                    justifications, some excellent/reasonable and some..
                    nebulous). That's all I'm saying, nothing about Congress or
                    such other nonsense. It seems like an occasion of the pot
                    calling the kettle names from this side of the fence.
       
                      beej71 wrote 1 day ago:
                      But Wikipedia promises you that it will modify its
                      content. They're transparent about that promise.
                      
                      An archival site (by default definition) promises you
                      that it will not modify its content. And when it does,
                      it's no longer an archival site.
                      
                      Wikipedia has never been an archival site and it never
                      will be. archive.today was an archival site, but now it
                      never will be again.
       
                        ouhamouch wrote 1 day ago:
                        This is your imaginary archive from the world of pink
                        ponies.
                        
                        Meanwhile their IMA on Reddit: no promises, no
                        commitment. Just like Microsoft EULA :)
                        
  HTML                  [1]: https://old.reddit.com/r/DataHoarder/comments/...
       
                          beej71 wrote 1 day ago:
                          What I don't see on that page is where they
                          explicitly don't promise to not modify anything in
                          the archive.
       
                            chrisjj wrote 1 day ago:
                            > What I don't see on that page is where they
                            explicitly don't promise to not modify anything in
                            the archive.
                            
                            I'm quoting all of that because is lacks an
                            explicit promise of non-modification /i
                            
                            Meanwhile seriously, if you were disappointed not
                            to see e.g. "We explicitly don't promise not to
                            modify", then perhaps you should consider why,
                            regardless, this site was trusted enough to get a
                            gazillion links in Wikipedia... and HN.
       
                              beej71 wrote 1 day ago:
                              > I'm quoting all of that because is lacks an
                              explicit promise of non-modification.
                              
                              And I'm quoting all of that because it lacks an
                              explicit (or implicit) promise of modification.
                              :)
                              
                              It was (emphasis on past-tense) so-trusted
                              because it advertises itself as an archival site.
                              (The linked disclaimer is all about it not being
                              a "long-term" archival site. It says it archives
                              pages for latecomers. There is an implication
                              here that it archives them accurately. What use
                              is a site for latecomers if they change the
                              content to be something else?) If they'd said or
                              indicated they would be changing the content to
                              no longer reflect the original site, Wikipedia
                              would not have linked to them because they
                              wouldn't be a credible source.
                              
                              In any case, now I can't use them to share or use
                              links since we can no longer trust those archives
                              to be untampered. When I share a link to nyt
                              content on archive.today or copy and paste
                              content into email, I'm putting my name on that
                              declaring "nyt printed this". If that's not true,
                              it's my reputation.
                              
                              Just like it was archive.today's.
       
                                esseph wrote 20 hours 30 min ago:
                                > When I share a link to nyt content on
                                archive.today or copy and paste content into
                                email, I'm putting my name on that declaring
                                "nyt printed this". If that's not true, it's my
                                reputation.
                                
                                What if the nyt article itself is the problem?
                                How does that square?
       
        chrisjj wrote 2 days ago:
        > an analysis of existing links has shown that most of its uses can be
        replaced.
        
        Oh? Do tell!
       
          eviks wrote 1 day ago:
          > the community should figure out how to efficiently remove links to
          archive.today
          
          You're part of the community! Prove him right!
       
            chrisjj wrote 1 day ago:
            :)
            
            But seriously, removal is simple but replacement is not.
       
          that_lurker wrote 1 day ago:
          I would be suprised if archive.today had something that was not in
          the wayback machine
       
            layman51 wrote 1 day ago:
            I know that sometimes the behavior of each archiver service is a
            bit different. For example, it's possible that both Archive.today
            and the Internet Archive say they have a copy of a page, but then
            when you open up the IA version, you might see that it renders
            completely differently or not at all. It might be caused because
            the webpage has like two scrollbars, or maybe there's a redirect
            that happens when a link to the page is loaded. I notice this seems
            to happen on documentation pages that are hosted by Salesforce. It
            can be a bit of a pain if you want to save to save a backup copy
            online of a release note or something like that for everyone to
            easily reference in the future.
       
              chrisjj wrote 1 day ago:
              > it's possible that both Archive.today and the Internet Archive
              say they have a copy of a page, but then when you open up the IA
              version, you might see that it renders completely differently or
              not at all
              
              AT archives the page as seen, even including a screenshot.
              
              IA archives the page as loaded, then when you view hamfistedly
              injects its header bar and executes the source JS. As you'd
              expect the result is often wrecked - or tampered.
       
            zahlman wrote 1 day ago:
            Trying to search the Wayback machine almost always gives me their
            made-up 498 error, and when I do get a result the interface for
            scrolling through dates is janky at best.
       
            chrisjj wrote 1 day ago:
            Archive.today has just about everything the archived site doesn't
            want archived. Archive.org doesn't, because it lets sites delete
            archives.
       
            bombcar wrote 1 day ago:
            Wayback machine removes archives upon request, so thereâs
            definitely stuff they donât make publicly available (they may
            still have it).
       
              super256 wrote 1 day ago:
              You don't even need to do requests if you are the owner of the
              URL. Robot.txt changes are applied in retrospect, which means you
              can disallow crawls to /abc, request a re-crawl, and all
              snapshots from the past which match this new rule will be
              removed.
       
            ribosometronome wrote 1 day ago:
            Accounts to bypass paywalls? The audacity to do it?
       
              that_lurker wrote 1 day ago:
              Oh yeah those where a thing. As a public organization they can't
              really do that.
              
              I personally just don't use websites that paywall important
              information.
       
          nobody9999 wrote 1 day ago:
          >> an analysis of existing links has shown that most of its uses can
          be replaced.
          
          >Oh? Do tell!
          
          They do.  In the very next paragraph in fact:
          
             The guidance says editors can remove Archive.today links when the
          original 
             source is still online and has identical content; replace the
          archive link so 
             it points to a different archive site, like the Internet Archive, 
             Ghostarchive, or Megalodon; or âchange the original source to
          something that 
             doesnât need an archive (e.g., a source that was printed on
          paper)
       
       
   DIR <- back to front page