101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

487
active users

#FileFormats

0 posts0 participants0 posts today
Kate Murray<p>New blog post about the Library of Congress 2025-2026 updates to the Recommended Formats Statement (RFS). A few changes to note related to Design and 3D formats (mostly just rescoping), updates to email-related metadata to better align with EA-PDF and continued work to document digital accessibility support in file formats listed as "acceptable" in the RFS. See the Change Log for all updates. Comments welcome! <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://blogs.loc.gov/thesignal/2025/07/rfs-updates-2025-2026/?loclr=eadpb" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blogs.loc.gov/thesignal/2025/0</span><span class="invisible">7/rfs-updates-2025-2026/?loclr=eadpb</span></a></p>
RodolfoRG<p>I'm wondering if there is a image file format that allows to embed translated text.<br>Useful for a lot of stuff, including comics.</p><p><span class="h-card" translate="no"><a href="https://framapiaf.org/@davidrevoy" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>davidrevoy</span></a></span> do you know any?</p><p><a href="https://mastodon.online/tags/comics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>comics</span></a> <a href="https://mastodon.online/tags/i18n" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>i18n</span></a> <a href="https://mastodon.online/tags/FileFormats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormats</span></a></p>
Karl Voit :emacs: :orgmode:<p>Hilarious <a href="https://graz.social/tags/SMBC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SMBC</span></a> on the <a href="https://graz.social/tags/plural" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>plural</span></a> of <a href="https://graz.social/tags/files" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>files</span></a>:<br><a href="https://www.smbc-comics.com/comic/plural-2" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">smbc-comics.com/comic/plural-2</span><span class="invisible"></span></a></p><p><a href="https://graz.social/tags/fun" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fun</span></a> <a href="https://graz.social/tags/file" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>file</span></a> <a href="https://graz.social/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://graz.social/tags/PIM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PIM</span></a></p>
Ross Spencer<p><b></b></p><p><strong><b>File format building blocks: primitives in digital preservation</b></strong></p><p><br>by <a rel="nofollow noopener" class="u-url mention" href="https://digipres.club/@beet_keeper" target="_blank">@<span>beet_keeper</span></a></p><p>A primitive in software development can be described as:</p><p>a fundamental data type or code that can be used to build more complex software programs or interfaces.</p><p>– via <a href="https://www.capterra.com/glossary/primitive/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">capterra.com/glossary/primitiv</span><span class="invisible">e/</span></a> (also Wiki: language primitives)</p><p>Like bricks and mortar in the building industry, or oil and acrylic for a painter, a primitive helps a software developer to create increasingly more complex software, from your shell scripts, to entire digital preservation systems.</p><p>Primitives also help us to create file formats, as we’ve seen with the Eyeglass example I have presented previously, the file format is at its most fundamental level a representation of a data structure as a binary stream, that can be read out of the data structure onto disk, and likewise from disk to a data structure from code.</p><p>For the file format developer we have at our disposal all of the primitives that the software developer has, and like them, we also have “file formats” (as we tend to understand them in digital preservation terms) that serve as our primitives as well.&nbsp;</p><p></p> <p class=""><i></i> </p> <p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/archives/" target="_blank">#Archives</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digipres/" target="_blank">#digipres</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digital-preservation/" target="_blank">#DigitalPreservation</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digital-preservation-essentialism/" target="_blank">#DigitalPreservationEssentialism</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/diplomatics/" target="_blank">#diplomatics</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/eyeglass/" target="_blank">#eyeglass</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/eygl/" target="_blank">#eygl</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-formats/" target="_blank">#FileFormats</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/information-records-management/" target="_blank">#InformationRecordsManagement</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/irm/" target="_blank">#IRM</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/json/" target="_blank">#JSON</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/open-data/" target="_blank">#OpenData</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/open-source/" target="_blank">#OpenSource</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/rdm/" target="_blank">#RDM</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/research-data/" target="_blank">#ResearchData</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/research-data-management/" target="_blank">#ResearchDataManagement</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/xml/" target="_blank">#XML</a></p>
Thorsted<p>The latest version of <a href="https://digipres.club/tags/PRONOM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PRONOM</span></a>, v120 has been released! Identification for 30 new PUIDs, 32 new signatures and 32 updates. <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://digipres.club/tags/digitalpreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digitalpreservation</span></a> <a href="https://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">nationalarchives.gov.uk/abouta</span><span class="invisible">pps/pronom/release-notes.xml</span></a></p>
ComradeVlast<p>Looking for some more advanced techies to help me out here. I was browsing the files of old abandonware (as one does) and came across the .zym file format in a game called Gubble 2. Does anyone have any idea what this file format is? Is it something proprietary by the gubble devs? Something that just isn't used anymore? The only thing google brought up regarding .zym was some mods for quake.</p><p><a href="https://mastodon.social/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tech</span></a> <a href="https://mastodon.social/tags/techquestions" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>techquestions</span></a> <a href="https://mastodon.social/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://mastodon.social/tags/abandonware" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>abandonware</span></a> <a href="https://mastodon.social/tags/windows98" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>windows98</span></a> <a href="https://mastodon.social/tags/techie" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>techie</span></a> <a href="https://mastodon.social/tags/oldtech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>oldtech</span></a> <a href="https://mastodon.social/tags/retrocomputing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>retrocomputing</span></a></p>
Kate Murray<p>SUPER excited to announce a new fdd, FIRST ONE ever from Liz Caringola, on PAR (Parity Volume Set File Format Family - fdd634). We/Liz even submitted for the <a href="https://digipres.club/tags/PUID" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PUID</span></a> in <a href="https://digipres.club/tags/PRONOM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PRONOM</span></a> and we'll do this as part of our workflow from now on. This format was a popular discussion topic with lots of community input. Comments always welcome but mostly if they say Go Liz! Great job! Woot! <a href="https://www.loc.gov/preservation/digital/formats/fdd/fdd000634.shtml" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">loc.gov/preservation/digital/f</span><span class="invisible">ormats/fdd/fdd000634.shtml</span></a> <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a></p>
Ross Spencer<p><b></b></p><p><strong><b>A year in file formats 2024</b></strong></p><p><br>by <a rel="nofollow noopener" class="u-url mention" href="https://digipres.club/@beet_keeper" target="_blank">@<span>beet_keeper</span></a></p><p>A great write up from Francesca at TNA about the past year for PRONOM via Georgia at the OPF.</p><p>It’s great to see the continuing work including vital translation of guides into other languages. Francesca includes a couple of shout outs to some pieces I have contributed in my spare time this year; including a collaborative workshop with Francesca, David, and Tyler at iPRES2024.</p><p></p> <p class=""><i></i> </p> <p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/archives/" target="_blank">#Archives</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/conferences/" target="_blank">#Conferences</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digipres/" target="_blank">#digipres</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digital-preservation/" target="_blank">#DigitalPreservation</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/droid/" target="_blank">#DROID</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-format/" target="_blank">#FileFormat</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-formats/" target="_blank">#FileFormats</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/ipres2024/" target="_blank">#ipres2024</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/outreach/" target="_blank">#outreach</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/pronom/" target="_blank">#PRONOM</a></p>
Kate Murray<p>Our semi-annual blog post is out to recap recent work with <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> and <a href="https://digipres.club/tags/digitalpreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digitalpreservation</span></a> at the Library of Congress. Highlights incl new format research on <a href="https://digipres.club/tags/finale" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>finale</span></a> music notation formats, participation in <a href="https://digipres.club/tags/iPres2024" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>iPres2024</span></a> and <span class="h-card" translate="no"><a href="https://digipres.club/@anj" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>anj</span></a></span>'s Registries of Practice project.</p><p>w/ Elizabeth M. Caringola, Genevieve Havemeyer - King, Liz Holdzkom and Marcus Nappier</p><p><a href="https://blogs.loc.gov/thesignal/2024/12/file-format-research-roundup/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blogs.loc.gov/thesignal/2024/1</span><span class="invisible">2/file-format-research-roundup/</span></a></p>
Daniel E. Weeks<p>“as a result of these choices, we now have three incompatible FASTQ variants that cannot be reliably distinguished.“</p><p><a href="https://fediscience.org/tags/bioinformatics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bioinformatics</span></a> <a href="https://fediscience.org/tags/genetics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>genetics</span></a> <a href="https://fediscience.org/tags/genomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>genomics</span></a> <a href="https://fediscience.org/tags/FileFormats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormats</span></a> </p><p>RE: <a href="https://neuromatch.social/@jonny/113479546922095889" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">neuromatch.social/@jonny/11347</span><span class="invisible">9546922095889</span></a></p>
#Digital ⚓️ #Vagabond 🦈<p>Huh, looking through some old links from 2015. Forgot about fileformat.info especially the screeds of updated articles on the front page.</p><p><a href="https://www.fileformat.info/index.htm" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">fileformat.info/index.htm</span><span class="invisible"></span></a></p><p><a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> '<a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a></p>
jonny (good kind)<p>literally who hurt genomics to make you all encode one specific kind of number as the ASCII characters from ! to ~ as an integer input to some logarithm function, but then others of you changed the function but kept encoding it as a single ASCII character ranging from -5 to 62 (???), and then later they decided that -5 to 62 was silly and so they changed that to 0-62 and throwing away half the original range for no reason, except actually it's 0-40 by convention.</p><p>did anyone consider "encoding it as a number"</p><p><a href="https://doi.org/10.1093/nar/gkp1137" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">doi.org/10.1093/nar/gkp1137</span><span class="invisible"></span></a></p><p><a href="https://neuromatch.social/tags/DataStandards" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataStandards</span></a> <a href="https://neuromatch.social/tags/FileFormats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormats</span></a> <a href="https://neuromatch.social/tags/Genomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Genomics</span></a></p>
Ross Spencer<p><strong>simpledroid: completing the circle</strong></p><p>It’s nearing the end of 2024 and that must mean a PRONOM <a href="https://www.dpconline.org/news/wdpd2024-pronom-hackathon" rel="nofollow noopener" target="_blank">hackathon</a> as part of the World Digital Preservation Day (#WDPD2024).</p><p>My contribution is a follow-up on my work earlier in the year to produce a valid DROID signature file from Wikidata in <a href="https://exponentialdecay.co.uk/blog/making-droid-work-with-wikidata/" rel="nofollow noopener" target="_blank">wddroidy</a>.</p><p><a href="https://github.com/ffdev-info/simpledroid" rel="nofollow noopener" target="_blank">simpledroid</a> is available on GitHub and creates a simple DROID signature file from PRONOM itself, creating a scripted pathway to create a signature file using official PRONOM data that doesn’t require the current PRONOM database and its legacy stored procedures.</p><p>It also does away with a lot of the excess data in the current DROID signature file which was previously an optimization for its Boyer Moore Horspool search algorithm, as described by <a href="https://groups.google.com/g/droid-list/c/EtKJ48ZYHEM/m/MmVg9J0lAAAJ" rel="nofollow noopener" target="_blank">Matthew Palmer</a>.</p><p>The primary reason for simpledroid was to complete the circle on my previous efforts and to prove that it was possible to create a simplified signature file and for it to work with DROID. The result is about 80-90% there, with only a few <a href="https://exponentialdecay.co.uk/blog/the-problem-with-comprehensive-test-suites/" rel="nofollow noopener" target="_blank">skeleton files</a> that remain unidentified – it should only require a small amount of forensic research to determine the reason.</p><p><a href="https://exponentialdecay.co.uk/blog/wp-content/uploads/2024/11/2024-11-10-droid-looiking-good.png" rel="nofollow noopener" target="_blank"></a></p><p>The output provides a way for simplifying the signature file generation process, offering new opportunities to create alternative versions, or filtering what’s already there, e.g. filtering out any signatures that aren’t explicitly for image identification, e.g. in a digitization workflow.</p><p>It may provide another way into PRONOM data for those who might look at DROID first as well as opening up different ways to modify and test signatures.</p><p>It is possible to see in the <a href="https://github.com/ffdev-info/simpledroid/blob/5ad408f6d2838165acb73d55d8a93cb8375c6e6e/reference/DROID_SignatureFile_Simple_2024-11-11T12-29-22Z.xml" rel="nofollow noopener" target="_blank">reference output</a>, that the signatures are much easier to understand via this simplified DROID file.</p><p><a href="https://exponentialdecay.co.uk/blog/wp-content/uploads/2024/11/droid-xml.png" rel="nofollow noopener" target="_blank"></a></p><p>simpledroid outputs a file with a smaller footprint than the current file:</p><p><code>1.2M DROID_SignatureFile_Simple_2024-11-11T12-29-22Z.xml</code><br><code>3.4M DROID_SignatureFile_V118.xml</code></p><p>It also contains all of the file classification data e.g. <code>FormatType="Video"</code> from PRONOM that will be added into DROID in a future release (and is already available in Siegfried).</p><p>Unlike the wddroidy work, priorities have also been added to the signature file so the mechanics of the signature file are pretty close to the official version (DROID uses the signature sequence and offsets to identify a file, but it then uses a priority to determine what results to display to the user where there may otherwise be positive matches for formats that provide the foundation for another, e.g. how XML forms the basis of SVG or XHTML.</p><p>It might be possible to remove some data around minimum and maximum offsets in the new file after discovering that simplified droid syntax requires curly bracket syntax at the beginning and end of sequences to mimic the same behavior, e.g.</p><p>With a <code>BOFoffset</code>, <code>min_offset = 2</code>, and signature = <code>BADF00D1</code>, the signature needs to become <code>{2}BADF00D1</code> to work.</p><p>The code is pretty straightforward and uses a few tricks to output XML sensibly without having to build the document’s tree (DOM) in a more verbose way. There are probably a few other shortcuts I’d fix with time if the code was ever useful, including improving variable naming and adding tests.</p><p>I’m not sure this code will ever be needed, or used by anyone, but for a quick hack and a quick proof of concept, it felt good to put it out there. Maybe someone will look at this or the wddroidy work and see there may be a way to federate different sources of signature information together into something DROID can use. Or it might be a useful demonstration to the DROID team that allows them to simplify PRONOM’s database and output mechanisms in a way that remains compatible with existing tools.</p><p><strong>Previous research week work</strong></p><p>My previous work for PRONOM research week includes a <a href="https://exponentialdecay.co.uk/blog/pronom-release-statistics/" rel="nofollow noopener" target="_blank">dashboard and API</a> for getting more information out of PRONOM, including listings of those records still requiring descriptions or signatures. You may find that work interesting and it is available at <a href="https://pronom.ffdev.info" rel="nofollow noopener" target="_blank">https://pronom.ffdev.info</a> and <a href="https://api.pronom.ffdev.info" rel="nofollow noopener" target="_blank">https://api.pronom.ffdev.info</a>.</p><p>And if you want to get in on the signature development work, signature development utility 2.0 (<a href="https://ffdev.info" rel="nofollow noopener" target="_blank">https://ffdev.info</a>) was also a previous effort of mine for <a href="https://openpreservation.org/blogs/pronom-research-week-signature-development-utility-2-0-ffdev-info/" rel="nofollow noopener" target="_blank">research week 2020</a> and will hopefully also benefit from outputting DROID’s simplified syntax.</p><p><strong>A week of file formats</strong></p><p>Of course with World Digital Preservation Day, file formats were pretty popular.</p><p>Andrew Jackson attempted to <a href="https://www.dpconline.org/blog/wdpd/meeting-the-file-format-challenge" rel="nofollow noopener" target="_blank">calculate how many distinct formats</a> might be out there using methods used to calculate ecological diversity.</p><p>Amanda Tome <a href="https://www.dpconline.org/blog/wdpd/blog-amanda-tome-wdpd2024" rel="nofollow noopener" target="_blank">described the scope of their work</a> and shared a number of useful resources including useful links to the PRONOM starter pack and to the PRONOM drop-in sessions.</p><p>You might also find out a bit more <em>about yourself</em> by playing this <a href="https://openpreservation.org/blogs/wdpd2024-gamification-of-digital-preservation/?q=1" rel="nofollow noopener" target="_blank">File Format Dating Game</a> from Lotte Wijsman and colleagues: <em>Susanne van den Eijkel, Anton van Es, Elaine Murray, Francesca Mackenzie, Ellie O’Leary, and Sharon McMeekin.</em> (I ended up on a date with <a href="http://justsolve.archiveteam.org/wiki/FASTA_and_FASTQ" rel="nofollow noopener" target="_blank">FASTA</a> (<a href="https://www.loc.gov/preservation/digital/formats/fdd/fdd000622.shtml" rel="nofollow noopener" target="_blank">FDD000622</a>) in my first play-through!)</p><p>Not specifically for WDPD, but in the same week I also enjoyed this presentation from Ange Albertini looking at different ways of identifying file formats. One big take away for me was thinking about how to get more forensic information out of a file format identification. DROID doesn’t tell us a lot, but <a href="https://exponentialdecay.co.uk/blog/hacking-the-droid-signature-file-for-characterization/" rel="nofollow noopener" target="_blank">is there a world in which one day it could</a>?</p><p></p><p>Let me know if you find any of this work useful at all; and good luck on your file format endeavors this week.</p> <p class=""><i></i> </p> <p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digipres/" target="_blank">#digipres</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digital-preservation/" target="_blank">#DigitalPreservation</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/droid/" target="_blank">#DROID</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-formats/" target="_blank">#FileFormats</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/pronom/" target="_blank">#PRONOM</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/python/" target="_blank">#Python</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/siegfried/" target="_blank">#siegfried</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/skeleton-test-corpus/" target="_blank">#SkeletonTestCorpus</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/wdpd/" target="_blank">#WDPD</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/wdpd2024/" target="_blank">#WDPD2024</a></p>
Thorsted<p><a href="https://digipres.club/tags/PRONOM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PRONOM</span></a> hackathon is still in full swing! Want to contribute? Adding a description is an easy way to help build on this amazing resource! <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://github.com/digital-preservation/PRONOM_Research/blob/main/PRONOM_Research_Week/Research_Week_2024.md#descriptathon" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/digital-preservatio</span><span class="invisible">n/PRONOM_Research/blob/main/PRONOM_Research_Week/Research_Week_2024.md#descriptathon</span></a></p>
Thorsted<p>Join me in celebrating <a href="https://digipres.club/tags/WDPD2024" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WDPD2024</span></a> by contributing to the PRONOM registry Hackathon (7th - 14th November 2024). <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://github.com/digital-preservation/PRONOM_Research/blob/main/PRONOM_Research_Week/Research_Week_2024.md" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/digital-preservatio</span><span class="invisible">n/PRONOM_Research/blob/main/PRONOM_Research_Week/Research_Week_2024.md</span></a></p>
Ross Spencer<p>Wikidata is a good service, Wikibase (on which Wikidata is built) is a better platform.</p><p>I have <a href="https://exponentialdecay.co.uk/blog/talk-using-a-custom-wikibase-with-siegfried/" rel="nofollow noopener" target="_blank">spoken before</a> about its potential to be added into the file-format registry ecosystem in a federated model.</p><p></p><p>If we are to use it as a registry that can perhaps complement the pipelines going into PRONOM, e.g. in vendor’s digital preservation platforms such as the Rosetta Format Library, a Wikidata should be able to output different serializations of signature file for tools such as Siegfried, DROID or FIDO.</p><ul><li>Siegfried ✅: <a href="https://github.com/richardlehane/siegfried/wiki/Wikidata-identifier" rel="nofollow noopener" target="_blank">https://github.com/richardlehane/siegfried/wiki/Wikidata-identifier</a></li><li>Fido ❌: I’ll need to revisit this!</li></ul><p>And what about DROID?</p><p><strong>Conversion to DROID</strong></p><p>It’s not straightforward to say to a Wikibase/Wikidata Query Service, <em>“output XML in the shape of a DROID signature file”</em>, but <strong>it is</strong> straightforward to write a converter script.</p><p>I had this very thought last week while presenting with colleagues at a <a href="https://ipres2024.pubpub.org/pub/ogp3o0yb/release/1" rel="nofollow noopener" target="_blank">File Format Workshop</a> at iPRES in Ghent.</p><p>It dawned on me that the conversion script would actually be simple thanks to a change in format to DROID whereby it can process all its own signatures, where previously it required DROID to pre-process them. It’s a long story, a more simple rendition is that DROID no longer requires DROID byte-code to record information about an identification pattern, and can instead store signatures in the attribute of a byte sequence element as-is, i.e. a PRONOM formatted regular expression from PRONOM itself, or Wikidata.</p><p>This realization resulted in my writing a conversion script (it took just over a half-day) during some down-time on the train home this past weekend.</p><p>The script is called wddroidy (after WD-40 🙄🥁) and can be found <a href="https://github.com/ross-spencer/wddroidy/" rel="nofollow noopener" target="_blank">here</a>.</p><p><strong>Results</strong></p><p>We can see using the skeleton suite from Richard Lehane’s <a href="https://github.com/richardlehane/builder/releases" rel="nofollow noopener" target="_blank">Builder</a> that we can positively identify files using the new signature file.</p><p><a href="https://exponentialdecay.co.uk/blog/wp-content/uploads/2024/09/results.png" rel="nofollow noopener" target="_blank"></a><a href="https://exponentialdecay.co.uk/blog/wp-content/uploads/2024/09/results-xfmt.png" rel="nofollow noopener" target="_blank"></a></p><p>Links can also be made to work with Wikidata identifiers by modifying the PUID URL pattern in the DROID configuration, e.g. to:</p><p><code>http://wikidata.org/entity/%s</code></p><p>The screenshot below shows where in the dialog that setting is:</p><p><a href="https://exponentialdecay.co.uk/blog/wp-content/uploads/2024/09/pattern.png" rel="nofollow noopener" target="_blank"></a></p><p><strong>Reference signature file</strong></p><p>A reference signature file can be found in the wddroidy repository <a href="https://github.com/ross-spencer/wddroidy/blob/46f31febe01530dc2cf74521cede79f183c10d37/reference/DROID_SignatureFile_WDQS_2024-09-22T07-48-41Z.xml" rel="nofollow noopener" target="_blank">here</a>. There are approximately 8119 file formats listed and 8195 file format signatures for those.</p><p><em><strong>NB.</strong> We know there are different issues with Wikidata including how to identify a “format” and the quality of the signatures. We capture some of these in a global repository: <a href="https://github.com/ffdev-info/wikidp-issues/issues" rel="nofollow noopener" target="_blank">https://github.com/ffdev-info/wikidp-issues/issues</a></em></p><p><strong>DROID simplified format</strong></p><p>The real headline here might be how easy it was to create the output using the DROID simplified format.</p><p>I have spoken about it briefly before but not in any detail.</p><p>In-short DROID no longer uses its own byte-code encoding that included strange terms such as <code>DefaultShift, Shift Byte</code>, and <code>SubSequence</code> (<a href="https://groups.google.com/g/droid-list/c/EtKJ48ZYHEM/m/MmVg9J0lAAAJ" rel="nofollow noopener" target="_blank">instructions to DROID about how to perform Boyer Moore Horspool search</a>). See below and note especially how the bytes are split in <code>Shift Byte</code> attributes and elements:</p><pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;&lt;FFSignatureFile xmlns="http://www.nationalarchives.gov.uk/pronom/SignatureFile" Version="1" DateCreated="2024-09-23T18:16:09+00:00"&gt; &lt;InternalSignatureCollection&gt; &lt;InternalSignature ID="1" Specificity="Specific"&gt; &lt;ByteSequence Reference="BOFoffset"&gt; &lt;SubSequence MinFragLength="0" Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"&gt; &lt;Sequence&gt;255044462D312E34&lt;/Sequence&gt; &lt;DefaultShift&gt;9&lt;/DefaultShift&gt; &lt;Shift Byte="25"&gt;8&lt;/Shift&gt; &lt;Shift Byte="50"&gt;7&lt;/Shift&gt; &lt;Shift Byte="44"&gt;6&lt;/Shift&gt; &lt;Shift Byte="46"&gt;5&lt;/Shift&gt; &lt;Shift Byte="2D"&gt;4&lt;/Shift&gt; &lt;Shift Byte="31"&gt;3&lt;/Shift&gt; &lt;Shift Byte="2E"&gt;2&lt;/Shift&gt; &lt;Shift Byte="34"&gt;1&lt;/Shift&gt; &lt;/SubSequence&gt; &lt;/ByteSequence&gt; &lt;/InternalSignature&gt; &lt;/InternalSignatureCollection&gt; &lt;FileFormatCollection&gt; &lt;FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="application/octet-stream"&gt; &lt;InternalSignatureID&gt;1&lt;/InternalSignatureID&gt; &lt;Extension&gt;ext&lt;/Extension&gt; &lt;/FileFormat&gt; &lt;/FileFormatCollection&gt;&lt;/FFSignatureFile&gt;</pre><p>The updated format was made possible via Matt Palmer via his <a href="https://github.com/nishihatapalmer/byteseek" rel="nofollow noopener" target="_blank">ByteSeek work</a>, and can now except a regularly encoded PRONOM formatted regular expression (regex) in an attribute in the <code>ByteSequence</code> element. See here for a signature file equivalent to the above:</p><pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;&lt;FFSignatureFile xmlns="http://www.nationalarchives.gov.uk/pronom/SignatureFile" Version="1" DateCreated="2024-09-23T18:16:09+00:00"&gt; &lt;InternalSignatureCollection&gt; &lt;InternalSignature ID="1" Specificity="Specific"&gt; &lt;ByteSequence Reference="BOFoffset" Sequence="255044462D312E34" Offset="0" /&gt; &lt;/InternalSignature&gt; &lt;/InternalSignatureCollection&gt; &lt;FileFormatCollection&gt; &lt;FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="application/octet-stream"&gt; &lt;InternalSignatureID&gt;1&lt;/InternalSignatureID&gt; &lt;Extension&gt;ext&lt;/Extension&gt; &lt;/FileFormat&gt; &lt;/FileFormatCollection&gt;&lt;/FFSignatureFile&gt;</pre><p>The format is much easier to read, and after a bit of time sitting with the DROID signature file format you realize it is fairly easy to output as well. I use some very rudimentary templates in wddroidy using&nbsp; Python’s <a href="https://realpython.com/python-f-strings/" rel="nofollow noopener" target="_blank">f-strings</a>.</p><p>It means other sources of PRONOM encoded signatures can output much simpler signature files and they can be used by DROID. I myself need to add it to the <a href="https://github.com/ffdev-info/signature-development-utility/issues/3" rel="nofollow noopener" target="_blank">signature development utility</a> – this would allow the utility to run standalone on anyone’s PC.</p><p>One next step for this approach might be to confirm that it does work entirely as expected by extracting all of PRONOM’s signatures proper and performing a mapping to the simplified format – if we can match against all the skeleton files in the latest Builder release then we should be looking good!</p><p><strong>Priorities</strong></p><p>I am always reminded, but always forget about priorities! This is part of how DROID resolves a file format into a single identifier, e.g. where SVG can match XML, we often want the more specific format returned, and so a priority is used to prioritize that one over the other, resulting in a single unambiguous identification for the DROID user. It manifests in the signature file as:</p> <code>&lt;FileFormat ID="634" MIMEType="image/svg+xml"</code> <code>Name="Scalable Vector Graphics"PUID="fmt/91"Version="1.0"&gt;</code> <code>&nbsp; &lt;InternalSignatureID&gt;24&lt;/InternalSignatureID&gt;</code> <code>&nbsp; &lt;Extension&gt;svg&lt;/Extension&gt;</code> <code>&nbsp; &lt;HasPriorityOverFileFormatID&gt;638&lt;/HasPriorityOverFileFormatID&gt;</code> <code>&lt;/FileFormat&gt;</code> More work needs to be done with Wikidata to understand if priorities can be properly applied to a DROID signature file. They are not written into the reference signature file above. <p><strong>Using the results</strong></p><p>Using the results can be done for two things:</p><ol><li>(Probably) There are a greater number of patterns in the Wikidata output than in PRONOM. If you have a file that remains unidentified, you can try the reference file for clues as to what it may be. I’d only use caution and investigate the exact byte sequence used for a match and understand its properties. I’d also check that the mapping also looks accurate, I’ve tried one or two runs using the identifier and it looks good, but there may still be mistakes.</li><li>For improving the quality of the sources in Wikidata. As you can see from the Skeleton suite there are a lot of gaps. We a) have a rough idea what these are, and b) know the identification doesn’t work via Wikidata. Why is that? Is the signature in Wikidata simply not good enough? Are patterns missing? Is there another error or issue we can help with given our expertise in file format identification?</li></ol><p><strong>Hacking wddroidy</strong></p><p>You can hack wddroidy. Currently it allows you to limit the number of results returned, and also modify the ISO language code used by the tool. You can see this in the command line arguments:</p><pre>python wddroidy.py --helpusage: wddroidy [-h] [--definitions DEFINITIONS] [--wdqs] [--lang LANG] [--limit LIMIT] [--output OUTPUT] [--output-date] [--endpoint ENDPOINT]create a DROID compatible signature file from Wikidataoptions: -h, --help show this help message and exit --definitions DEFINITIONS use a local definitions file, e.g. from Siegfried --wdqs, -w live results from Wikidata --lang LANG, -l LANG change Wikidata language results --limit LIMIT, -n LIMIT limit the number of resukts --output OUTPUT, -o OUTPUT filename to output to --output-date, -t output a default file with the current timestamp --endpoint ENDPOINT, -url ENDPOINT url of the WDQSfor more information visit https://github.com/ross-spencer/wddroidy</pre><p>The actual SPARQL query used can be manually edited in the <a href="https://github.com/ross-spencer/wddroidy/blob/46f31febe01530dc2cf74521cede79f183c10d37/src/wddroidy/wddroidy.py#L61-L89" rel="nofollow noopener" target="_blank">src folder</a>. E.g. you can limit the query by format or family or classification. I provide some more inspiration in the <a href="https://github.com/richardlehane/siegfried/wiki/Wikidata-identifier#trid-example-sparql" rel="nofollow noopener" target="_blank">Siegfried Wiki</a>.</p><p><strong>Let me know if it’s useful!</strong></p><p>This is really just a quick hack and it needs a lot more testing to improve the quality of the output. Most can be dealt with on the Wikidata side I am sure, but some might need to be done in the tool. If it’s useful, reach out, and let’s discuss what can be changed or how it can be used in your work.</p><p><strong>Data quality</strong></p><p>It will quickly become apparent the data quality isn’t what it is with PRONOM and that is why a curated and authoritative service such as PRONOM is always going to be needed. As mentioned in previous talks, this can in theory be complemented with downstream data in federated databases. This might mean curating Wikidata better using some of the tools available, or curating data into a Wikibase (the platfom Wikidata is built upon). Both options bring different benefits and advantages such as creating a bigger tent of signature developers on Wikidata, or, another example, more <a href="https://groups.google.com/g/droid-list/c/OgeJX6oEaAc/m/E01waCKSAwAJ" rel="nofollow noopener" target="_blank">expressive signatures</a> being made available via federated Wikibases.</p><p><strong>And a word on Wikiba.se</strong></p><p>A reminder too, that setting up a Wikibase can take some effort (I was once running three at the same time 😬) but a service called <a href="https://wikiba.se/" rel="nofollow noopener" target="_blank">https://wikiba.se/</a> exists. wikiba.se could form an excellent scratch pad to begin thinking about mapping PRONOM like data to a Wikibase and also begin solving some of the other issues around <a href="https://docs.google.com/presentation/d/1DcB68a2u6dpCjAAbPXqPHizHiu7VgKAmT9Z6dv_C3bU/edit?usp=sharing" rel="nofollow noopener" target="_blank">mapping container signatures</a> and outputting those in a way that is compatible for DROID. Let me know if you give it a whirl, or want to collab on any of that.</p><p>Otherwise, thanks in advance! And enjoy wddroidy!</p> <p class=""><i></i> </p> <p><a href="https://exponentialdecay.co.uk/blog/making-droid-work-with-wikidata/" class="" rel="nofollow noopener" target="_blank">https://exponentialdecay.co.uk/blog/making-droid-work-with-wikidata/</a></p><p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/code/" target="_blank">#Code</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/coding/" target="_blank">#Coding</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digipres/" target="_blank">#digipres</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/digital-preservation/" target="_blank">#DigitalPreservation</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/droid/" target="_blank">#DROID</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-format/" target="_blank">#FileFormat</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/file-formats/" target="_blank">#FileFormats</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/open-data/" target="_blank">#OpenData</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/pronom/" target="_blank">#PRONOM</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/siegfried/" target="_blank">#siegfried</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/software-development/" target="_blank">#SoftwareDevelopment</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://exponentialdecay.co.uk/blog/tag/wikidata/" target="_blank">#wikidata</a></p>
Thorsted<p>Got my first PDF 2.0 file into my repository. Didn’t know Word was saving to the new format! <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a> <a href="https://digipres.club/tags/wtfPDF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>wtfPDF</span></a></p>
Thorsted<p>My paper is available for an early read. <a href="https://ipres2024.pubpub.org/pub/frnya0ft/release/1" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ipres2024.pubpub.org/pub/frnya</span><span class="invisible">0ft/release/1</span></a> <a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/Macintosh" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Macintosh</span></a> <a href="https://digipres.club/tags/fileformats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fileformats</span></a></p>
#Digital ⚓️ #Vagabond 🦈<p><span class="h-card" translate="no"><a href="https://digipres.club/@anj" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>anj</span></a></span> because character limits, I've responded in blog form. Hopefully there are some useful pointers here. Also, for whatever reason there seems to be context missing from your writing (especially given your current responsibilities) and I feel it important context, so I've elaborated on it here.</p><p><a href="https://exponentialdecay.co.uk/blog/we-love-our-fiefdoms-the-dpc-and-wikidata/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">exponentialdecay.co.uk/blog/we</span><span class="invisible">-love-our-fiefdoms-the-dpc-and-wikidata/</span></a></p><p><a href="https://digipres.club/tags/digipres" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digipres</span></a> <a href="https://digipres.club/tags/FileFormats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormats</span></a> <a href="https://digipres.club/tags/FileFormatFriday" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormatFriday</span></a></p>
petersuber<p>Update. Again this year <span class="h-card" translate="no"><a href="https://bird.makeup/users/librarycongress" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>librarycongress</span></a></span> recommends <a href="https://fediscience.org/tags/XML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>XML</span></a> and <a href="https://fediscience.org/tags/EPUB" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EPUB</span></a> over <a href="https://fediscience.org/tags/PDF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PDF</span></a>. (Scroll to I.ii.B.)<br><a href="https://www.loc.gov/preservation/resources/rfs/text.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">loc.gov/preservation/resources</span><span class="invisible">/rfs/text.html</span></a> </p><p><a href="https://fediscience.org/tags/FileFormats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileFormats</span></a> <a href="https://fediscience.org/tags/Formats" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Formats</span></a></p>