101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

506
active users

#awk

0 posts0 participants0 posts today

Since they are shutting down soon, yesterday evening I downloaded my Pocket data and, because I have weird compulsions, decided to use #awk to parse the very simple CSV file into a very simple Markdown file.

And then, because my previous, equally niche, AWK post had been oddly popular, I wrote about this new script too: blog.sgawolf.com/post/2025-05-

Simon Wolf's Blog · Pocket ParserEarlier today I received an email telling me that Mozilla was shutting down Pocket, a service which allowed you to save links to web pages you wanted to read later. They have a data export facility so I could get a copy of the data I had stored and, after downloading the file, I realised it was a very simple CSV file containing the page title, the URL, when it was saved, the category(s) assigned to it and the read state. Because it seemed oddly popular last time I wrote about it, I decided to parse the file in AWK and generate a simple Markdown file containing the links, grouped by the category. The Output I Wanted One of the joys of getting older is the lack of guilt about becoming more regressive in my approach to technology and I’m using plain text file formats more and more as I realise the value of “owning” my own data. I’m not sure if it is a realistic long-term solution but my immediate thought was to simply store the data I had added to Pocket in a single Markdown file which listed the links in groups based on the category: # Category * [Page Title](URL) on Date The Source Data As mentioned above, the export from Pocket is remarkably straightforward, and shows how a fairly complex service for grabbing page information via browser plugins and then allowing the pages to be managed and reviewed in smartphone apps and a web application essentially boils down to five field in CSV file. The first line of the file contains the column or field headings and then each line after that contains the details for a page: title,url,time_added,tags,status Pocket,https://getpocket.com/home,1661766509,,unread Considerations There were a few things I needed to take account of before digging into my AWK script: CSV files can contain commas inside fields and whilst the CSV format handles that by wrapping the field in quotes, AWK would just react to the commas and treat them as delimiters. The script needed to ignore the first line in the file which contains the heading information. The data and time when the file was saved are UNIX timestamps (the number of seconds since 1 January 1970 in UTC) and would need to be converted into something readable. The data the category, if one exists, is all lowercase. The category name would ideally have the initial character turned into uppercase. CSV File Handling This is where I was lucky. For the first 46 years of AWK this would involve using a third-party solution or creating my own. But in 2023 CSV support was added to AWK and gawk. Parsing a CSV file is now as easy as invoking AWK with either the -k or --csv options. And this also automatically sets the delimiter to a comma. awk --csv -f pocket_parser.awk pocket-parser-download-data.csv Skipping The First Line The first line of the file can be skipped by only parsing lines where the record (line) number, NR, is greater than one: { if (NR > 1) { ... } } Formatting The Date The date and time when the link was originally saved is stored in the third ($3) field and AWK has a strftime function which allows UNIX timestamps to be formatted into a string value. I wanted to use a format so that dates and times would be formatted as 2025-05-22 at 15:37:04: { if (NR > 1) { title = $1 url = $2 formatted_date = strftime("%Y-%m-%d at %H:%M:%S", $3) ... Fixing The Category Name The category is stored in the fourth ($4) field. If a category is not set then I will use “Undefined” just so that there is a reasonable way to group the links. If there is a category name then I can combine the toupper and tolower functions with the substr function to extract the required parts of the name. substr($4,1,1) gets the first character from the $4 field. toupper will then convert it to uppercase. substr($4,2) gets the second character onwards from the $4 field. tolower will then convert it to lowercase. { if (NR > 1) { ... if ($4 == "") { category = "Undefined" } else { category = toupper(substr($4,1,1)) tolower(substr($4,2)) } ... This might be very specific to how I used Pocket with single category names and them being all lowercase. Note that string concatenation in AWK doesn’t have an operator, you just write expressions next to each other with no operator. This is why the toupper and tolower functions can have zero, one or more spaces between them but the output will be “joined”: The following would all result in something like Example if the fourth field contained example: toupper(substr($4,1,1))tolower(substr($4,2)) toupper(substr($4,1,1)) tolower(substr($4,2)) toupper(substr($4,1,1)) tolower(substr($4,2)) Collating The Data Now that the page title, URL, formatted date and category have been gathered I needed to store the Markdown formatted version of it all. I am going to show the URLs as a bullet list so there is an asterisk at the start of each line. I also need to add a newline (/n) at the end of each entry too. In Markdown the links look like this: * [A Page Title](https://page_url.html) on 2025-05-23 at 15:37:04 In AWK, using field references and the formatted_date variable we created it looks like: "* [" $1 "](" $2 ") on " formatted_date "\n" The page links need to be stored in an ‘array’, which I called arr. This is an array subscript (very similar to a map or dictionary in other languages) where the category names are used as the subscript: arr[category]=... The value stored against each subscript is just a string which contains the Markdown for each page link. ... arr[category] = arr[category] "* [" $1 "](" $2 ") on " formatted_date "\n" ... Eh? Just in case that’s not entirely clear, a less convoluted example of what is going on can be created from a list of pets: cat,Binky cat,Mr Tibbles dog,Fido cat,Slippers dog,Bruno dog,Shep In each record $1 is the animal type and $2 is the pet’s name. The idea is to end up with an array of animal types which each containin a string of names, each on their own line: cat = "Binky Mr Tibbles Slippers" dog = "Fido Bruno Shep" To create this, a simplified version of the code from above is: ... arr[$1] = arr[$1] $2 "\n" ... And the arr variable would end up containing: arr["cat"]="Binky\nMr Tibbles\nSlippers\n" arr["dog"]="Fido\nBruno\nShep\n" Step-by-Step The step-by-step process is as follows… arr is empty. The first record is “cat,Binky”. $1 is “cat” and $2 is “Binky”. arr["cat"] does not exist so arr[$1] = arr[$1] $2 "\n" is concatenating "" with Binky and \n and then storing that against arr["cat"]. arr["cat"]="Binky\n" The second record is “cat,Mr Tibbles”. arr["cat"] does exist so this time the existing data is concatenated with the new data: Binky\n is concatenated with Mr Tibbles and \n and then stored against arr["cat"]. arr["cat"]="Binky\nMr Tibbles\n" The third record is “dog,Fido”. arr["dog"] does not exist so we concatenate "" with Fido and \n and store it against arr["dog"]. arr["cat"]="Binky\nMr Tibbles\n" arr["dog"]="Fido\n" And so on… Outputting The Markdown The Markdown file I want to generate should contain the category headings and a list of the links below it: # Category One * Link One * Link Two # Category Two * Link Three * Link Four ... This output is generated in an END block in AWK since the output is only generated once all of the file’s lines have been parsed, not as each is parsed. END { for (a in arr) printf "# %s\n\n%s\n", a, arr[a] } This will be clearer using the cat and dog data. The data has been tweaked from the earlier example to add an asterisk before each pet’s name so that the Markdown would be formatted as a bulleted list: arr["cat"]="* Binky\n* Mr Tibbles\n* Slippers" arr["dog"]="* Fido\n* Bruno\n* Shep" The code loops through each “grouping name” (“cat” and “dog”) in arr. For each grouping name it will output # , the grouping name, two newlines (\n\n) the text stored in the grouping, and then a final newline. Replacing the newline characters makes this a bit clearer: # [Grouping Name] [Grouping Contents] The end result would be: # Cat * Binky * Mr Tibbles * Slippers # Dog * Fido * Bruno * Shep The Complete Script { if (NR > 1) { formatted_date = strftime("%Y-%m-%d at %H:%M:%S", $3) if ($4 == "") { category = "Undefined" } else { category = toupper(substr($4,1,1)) tolower(substr($4,2)) } arr[category] = arr[category] "* [" $1 "](" $2 ") on " formatted_date "\n" } } END { for (a in arr) printf "# %s\n\n%s\n", a, arr[a] } Checking It Out The script is available for download as pocket-parser.awk and a sample data file as pocket-parser-download-data.csv. An example of the output is available as pocket-parser-output.md. You can run the script yourself via awk --csv -f pocket_parser.awk pocket-parser-download-data.csv > pocket-parser-output.md.

Hey beloved #noGooMe users,

A quick reminder that you *must* have the proxy image option unchecked when using this #SearxNG instance. It can be disabled, if you previously had it enabled, browsing to /preferences, then selecting the "privacy" tab and uncheck the "image proxy" switch.

It is disabled by default so any first connection to this instance has the proper setting set up.

If you do have it enabled, you’ll quickly be blocked at the firewall level by our #awk AI agent 🤖

My reason for disabling this feature is that it generates loads of requests from the instance IP to the external engines. And this makes those block us really fast.

This means that the engines will have *your IP* logged when you search for images, only. Classical text search are still masqueraded with our instance IP.

I am in urgent job search mode, so I'm gonna throw this out here and see if anything comes of it.

I am a #Canadian, fluent in both #English and #French. I have experience with several programming languages. My strongest proficiency is with #Haskell and #C. I also have a reasonable grasp of #HTML, #JavaScript, #SQL, #Python, #Lua, #Linux system administration, #bash scripting, #Perl, #AWK, some #Lisp (common, scheme, and emacs), and probably several others I've forgotten to mention.

I am not necessarily looking for something in tech. I just need something stable. I have done everything from software development, to customer support, to factory work, though my current circumstances make in-person work more difficult than remote work. I have been regarded as a hard worker in every job I have ever held.

Doing some table generation in awk, and is there an easier way to do "all fields n to NF" than this?

function combine(combn, combs) {
for (; combn<=NF; ++combn) {
combs=combs "\t" $(combn)
}
return combs
}

I could use printf on the fragments, but the annoyance is that loop, instead of
(string-join (cddr fields) "\t")
or whatever.

If you want to run a #Linux command, displaying only the header line and lines which match a pattern, you can turn to #awk

Example:

sudo ss -tlunp | awk 'NR == 1 { print; } /dnsmasq/ { print; }' | less -NSiRJ

@holger grep, awk and sed. Available on pretty much every distribution. Not leaking info, either.

Honestly - steep learning curves but once you're even slightly proficient then you can find all the needles in all the haystacks you want, neatly.

Add in jq for conversion to JSON for a more transportable format.

hey you AI, help me to create this #AWK script...
forget it, I can do it by myself.
#bash? same thing...
you have much better results with #python or others, but definetly you are in trouble if you do not understand how to do it or troubleshoot it (which can take longer than just do it)

Oh masters of the terminal.
I need your wisdoms

If i have tab as field separator in awk
and lets say field $3 contains "whitespace" separated values.

Is that even possible?
I have a super messy log I am trying to look into
#awk #terminal