#Mastodon #archiving #privacy 

Cultural clashes re archiving of posts in the Mastodon fediverse may heat up soon. Here's a post from someone noting a traditional aversion here to crawling: mastodon.social/@Stoori@polygl

And here's one noting (approvingly) that Internet Archive was saving their posts without them being asked: mastodon.social/@petersuber@fe

I suspect expectations may differ by instance. robots.txt is one way to tell crawlers to go away. Do archive-averse instances tend to say if they use it?

The Financial Times has printed an obvious error on its "Market Data" pages every day for the last 18 months and I appear to be the only one of their 112k+ subscribers who has noticed. Maybe it's time to stop printing these pages? mako.cc/copyrighteous/the-fina

“Why don’t we, the academic community, just build our own citation database / academic search tool?”

It has been a decade since I have worked in this space, some of what I am about to write may be out of date if not outright misremembered. That said, access to the primary sources is a massive and perhaps insurmountable obstacle.

Back then, you couldn’t even get the citation graph. I think this may have improved. But if you want full academic search, you need full text of the documents.

Show thread

from earlier this spring, along the I-90 corridor. I love the white+green+red that time of year

thinking of fall meals, this was my birthday dinner for 2022: (vegie) squab pie, mashed, and mushy peas. carrot cake for dessert, 'Port of Spain' cocktail

bluesky / atproto 

I wrote a blog post about atproto and "big world" social media use cases: bnewbold.net/2022/atproto_thou

I am way out of practice writing and feel like this is poorly edited, but have to start again somewhere!

hi friends, there will be a memorial for Peter Eckersley at the Internet Archive in SF on March 4 2023: facebook.com/events/2366523486. please spread the word to his friends who aren't on Facebook!

twitter migration 

gotta get depthsofwikipedia over here. and depthsofinternetarchive!

via: twitter.com/depthsofwiki/statu

IA did a rare public event at the East Bay warehouse last month. Really glad more folks got a chance to see the space; even many staff never get out there. Loads of books, servers, digitization equipment, side projects. The scale (and the frugality) of the whole endeavor is clearer, and this isn't even a large physical storage site for us these days.

And great to see it all spiffied up, a lot of work!

Still thinking about these beautiful fall hills near Salt Lake City a few weeks back

For those wondering about the availability (now and in the future) of archived Twitter data: The Library of Congress has the full record of the first 12 years of Twitter data, but they stopped full collection in 2017. Since then, they have only archived tweets deemed to be of historical significance. Here’s a white paper from them describing why they stopped full archival collection, and what they still collect.


CSVconf is happening again. This time in Buenos Aires - April 19-20, 2023. It’s a great community conference focused on elegant approaches to hacking/interoperability…and how simple data projects can solve societal problems. Please consider submitting a proposal. And please spread the word. csvconf.com

Twitter! 🐥☠️ Archiving! 💾🗄️ 

As interest builds in archiving parts of Twitter, it’s a good time to check out Documenting the Now, a set of tools and guiding policies for archiving social media ethically, with guidance by communities at risk. It’s a years-long project led by archivists, who are specially trained to understand the ethical and practical implications of preservation. docnow.io

Quadratic Equation, in Braille. Via Visual impairment in MSOR by Emma Jane Rowlett and Peter James Rowlett (2010)

