usually that sort of thing would be specified in the guide: https://guide.fatcat.wiki/entity_container.html
but in this case it is sort of subjective and used mostly for analytics. it is inferred from other metadata fields. code is, obscurely, here: https://github.com/internetarchive/chocula/blob/master/chocula/database.py#L580
"One site, scholar.archive.org, has PDFs going back to the 18th century. It’s empowering to look for this stuff instead of waiting for it to be socially discovered and jammed into my brain."
(h/t @jstogdill for the reference)
@albertcardona @MarcusStensmyr @textfiles @giorgiogilestro whoops! we have the "editor" metadata in the catalog: https://fatcat.wiki/release/fyescyq5lncvxmzlrnspa3umce/contribs
but doesn't come through in search results. how would you expect this to display in search results? "(Ed.)" after the editor name? PubMed seems not to display; PLOS (publisher) shows in separate metadata box
@melissaekline @textfiles @ORCID_Org we don't have any plans to take on author profile pages ourselves, though somebody could certainly build something like that on top of the catalog API. getting human names and de-duplication correct is hard and can be harmful if done wrong.
the fatcat catalog does have a concept of "creators" (eg, authors and editors), and can be edited by anybody to update author/paper linkages. it contains a lot of ORCID-based records, but is not very complete
by default we only display results that we have an accessible full text copy of. you can change the "availablity" filter to "all records" to override this.
if you know of an open copy of a paper we don't have, you can click through to 'fatcat' and submit a "Save Paper Now" request for us to crawl and index it
we have no current intention of building high-quality author profile pages, or computing "leaderboard" style citation metrics summaries, which are features of Google Scholar.
one difference is that all of our biblio metadata and code is openly available for reuse, and we have an open search API
For reasons I can't fathom, Internet Archive Scholar got attention today, a mass of it, painting it as a "new" service. Actually, it has been out there for about a year. BUT....
If beautifully structured access to academic citations by the millions is your bag or desperately needed tool, especially ones that are ONLY left in the Wayback Machine, you are in LUCK. And this will be your favorite day. Try it.
𐑴 𐑢𐑬, 𐑥𐑲 𐑐𐑱𐑡 𐑯 𐑚𐑻𐑛𐑕𐑲𐑑 𐑩𐑒𐑬𐑯𐑑 𐑜𐑧𐑑 𐑩 𐑖𐑬𐑑𐑬𐑑 𐑦𐑯 𐑩 𐑮𐑰𐑕𐑩𐑯𐑑 𐑨𐑒𐑩𐑛𐑧𐑥𐑦𐑒 𐑸𐑑𐑦𐑒𐑩𐑤 𐑪𐑯 𐑦𐑙𐑜𐑤𐑦𐑗 𐑕𐑐𐑧𐑤𐑦𐑙 𐑮𐑦𐑓𐑹𐑥. ·𐑦𐑯𐑑𐑼𐑯𐑧𐑑 𐑸𐑒𐑲𐑝 𐑕𐑒𐑪𐑤𐑼 𐑦𐑟 𐑩 𐑜𐑮𐑱𐑑 𐑮𐑦𐑟𐑹𐑕 𐑑 𐑥𐑱𐑒 𐑩𐑝𐑱𐑤𐑩𐑚𐑩𐑤 𐑞 𐑓𐑮𐑵𐑑𐑕 𐑝 𐑨𐑒𐑩𐑛𐑧𐑥𐑦𐑒 𐑮𐑦𐑕𐑻𐑗.
→ Oh wow, my page and birdsite account get a shoutout in a recent academic article on English spelling reform. Internet Archive Scholar is a great resource to make available the fruits of academic research.
Pushed a fresh snapshot of fatcat metadata last week:
Hundreds of millions of paper, file, and journal records. More info about these dumps, and schema, at https://guide.fatcat.wiki/bulk_exports.html
Scholar contains content that we have crawled from open sources, such as OA publishers, repositories, and national libraries. Most of our Japanese metadata and content comes via the JALC DOI registrar and JSTAGE hosting site.
There is also digitized print content in archive.org, and some of that ends up in scholar. I don't know of any specific Japanese research collections there.
There are more details in "the guide": https://guide.fatcat.wiki/
Scholar is built on an open, editable bibliographic catalog: https://fatcat.wiki
Most of the records are automatically imported from our wonderful upstream sources, but any human can directly submit corrections and additions through the web interface or API. These submissions are then reviewed in the open before merging. The entire catalog is versioned and can be downloaded in bulk or synchronized using a "changelog" feed.
You can learn more about editing at:
"his way" / bunch of dudes
Hemingwayesque: 104 hits https://scholar.archive.org/search?q=Hemingwayesque
Kiplingesque: 340 hits https://scholar.archive.org/search?q=Kiplingesque
Turneresque: 358 hits https://scholar.archive.org/search?q=Turneresque
Kafkaesque: 2,423 hits https://scholar.archive.org/search?q=Kafkaesque
"his way" / bunch of dudes
Sinatraesque: 3 hits https://scholar.archive.org/search?q=Sinatraesque
Cocteauesque: 8 hits https://scholar.archive.org/search?q=Cocteauesque
Bowiesque: 7 hits https://scholar.archive.org/search?q=Bowiesque
Ramboesque: 20 hits https://scholar.archive.org/search?q=Ramboesque
Bergmanesque: 31 hits https://scholar.archive.org/search?q=Bergmanesque
Pynchonesque: 24 hits https://scholar.archive.org/search?q=Pynchonesque
Escheresque: 36 hits https://scholar.archive.org/search?q=Escheresque
Felliniesque: 48 hits https://scholar.archive.org/search?q=Felliniesque
McCarthyesque: 42 hits https://scholar.archive.org/search?q=McCarthyesque
Daliesque: 57 hits https://scholar.archive.org/search?q=Daliesque
trafilatura (https://github.com/adbar/trafilatura) is a nice python library that we use to extract article full text from HTML documents for indexing in scholar. It has good accuracy and recall, works with "old" HTML (eg from web archives), and pulls out metadata like title, author, and date. There are lots of similar tools, mostly focused on news articles, and trafilatura is an improvement.
Thanks to Adrien Barbaresi for maintaining it!
Scholars continue to publish papers in Latin, well in to the twenty first century! Here is a snippets of Dennis Toscano's Masters thesis from the University of Kentucky (2016), contextualizing an anonymous poem, itself in Latin, from 1741:
Opus cui titulus est "Carthago Indiarum obsessa sed non expugnata" est carmen divulgatum sine nomine auctoris saeculo duodevicesimo ad celebrandam victoriam quam Hispani a Britannis Carthagenae Indiarum anno...
Search engine for tens of millions of preserved research papers.
An @internetarchive project: free software, open metadata, open API, non-profit, ad-free, privacy respecting.
A Mastodon Server for Internet Archive employees and Role Accounts (Announcements)