• 0 Posts
  • 18 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle




  • A classic use for them is spam filtering.

    Suppose you have a set of spam detection systems/rules which are somewhat expensive to execute, eg a ML model or keyword blocklist. Spam tends to come in waves, and frequently it can be as simple as reposting the same message dozens of times.

    Once your systems determine a piece of content is spam (or you manually flag content), it’s a good idea to insert the content into a bloom filter. This means that future posts of the identical content will be flagged without needing to execute the expensive checks, especially if there’s a surge of content stressing your systems.

    Since it’s probabilistic, you can’t use this unless you have some sort of manual reviewing queue or system, as it’s possible for false positives to be flagged. However, you can also run more intensive checks once you’ve flagged content, to detect false positives.

    The false positives can also be a feature, not a bug: with careful choice of hash functions, your bloom filter can actually detect slightly modified content, since most of the hashes may still be the same.

    I’ve worked at companies which use this strategy so it’s very real world.







  • They probably know what it is, but it’s a bad point if they’re trying to paint DAGs as esoteric CS stuff for the average programmer. I needed to use a topological sort for work coding 2 weeks ago, and any time you’re using a build system, even as simple as Make, you’re using DAGs. Acting like it’s a tough concept makes me wonder why I should accept the rest of the argument.

    Can’t say I have a strong feeling about Gradle though 🤷‍♀️


  • You might be even more concerned to find that your Fedora package manager, DNF, is also written in Python: https://github.com/rpm-software-management/dnf

    Fact of the matter is that Python is a language that gets used all the time for system level things, and frequently you just don’t know it because there is no “.py” extension.

    I’m not sure I understand your concerns about python…

    1. Performance is worse than C, yes. But writing performance sensitive code in Python is quite silly, it’s common to put that in a C library and use that within python to get the best of both worlds. DNF does this with libdnf.
    2. “It feels like an extension of proprietary hardware planned obsolescence and manipulation.” This is very confusing to me. There has been one historic version change (2->3) which broke compatibility in a major way, and this version change had a literal decade of help and resources and parallel development. The source code for every Python interpreter version is freely available to build and tweak if you’re unhappy with a particular version. Most python scripts are written and used for ages without any changes.
    3. “i don’t consider programs written in Python to have permanence or long term value because their toolchains become nearly impossible to track down from scratch.” Again, what? As I said, every Python version is available to download, build, and install, and tweak. It’s pretty much impossible for python code to every become unusable.

    Anyway, people like the Fedora folks working on anaconda choose a language that makes sense for their purpose. Python absolutely makes sense for this purpose compared to C. It allows for fast development and flexibility, and there’s not much in an installer program that needs high performance.

    That’s not to say C isn’t a very important language too. But it’s important to use the best tool for the job.



  • It’s a cathartic, but not particularly productive vent.

    Yes, there are stupid lines of time.sleep(1) written in some tests and codebases. But also, there are test setUp() methods which do expensive work per-test, so that the runtime grew too fast with the number of tests. There are situations where there was a smarter algorithm and the original author said “fuck it” and did the N^2 one. There are container-oriented workflows that take a long time to spin up in order to run the same tests. There are stupid DNS resolution timeouts because you didn’t realize that the third-party library you used would try to connect to an API which is not reachable in your test environment… And the list goes on…

    I feel like it’s the “easy way out” to create some boogeyman, the stupid engineer who writes slow, shitty code. I think it’s far more likely that these issues come about because a capable person wrote software under one set of assumptions, and then the assumptions changed, and now the code is slow because the assumptions were violated. There’s no bad guy here, just people doing their best.


  • The reason is simple: in order to be a signed piece of secure boot software, the kernel needs to do everything possible to prevent unsigned code from running at the kernel’s privilege level, or risk its signing key getting revoked by Microsoft.

    I assume your kernel is from Fedora and is signed. If your kernel, once loaded, allowed the loading of unsigned kernel modules, then any attacker could use it as part of an exploit that allows them to break secure boot. They would simply include a copy of the Fedora kernel, and then write a custom kernel module which takes control of the machine and continues their attack. The resulting exploit could be used on any system to bypass and defeat secure boot. In essence, secure boot is only as secure as the weakest signed implementation out there.

    So, Linux distributors need to demonstrate to Microsoft that they don’t allow unsigned kernel code execution. Linux contains a feature called lockdown, which implements this idea. In order to be effective, lockdown must be automatically enabled by the kernel if secure boot is enabled. Interestingly, Linus flat out refuses to include the code to do that, I guess he disagrees with it. So a little discussed reality of secure boot is that, all Linux kernels which are signed have this extra patch included in order to enable lockdown during secure boot.

    And that is why you can’t load an unsigned module when secure boot is enabled.


  • I use two monitors, and also KDE’s virtual desktops for work. A killer feature for me is that KDE has a window manager option to “pin” specific windows so that they are present on every desktop. This means I can have my terminal and slack client split across one screen and pinned, and then the other screen can contain my “main focus” on each of the virtual desktops - browser, editor, or email. I always can see the chat/terminal but can easily swap the desktop to get to a different focus.

    I know that I could just have everything on one desktop and use the alt-tab to change that main window. But the alt tab is slow and non-deterministic. I may have to cycle between five things before I get to the browser, for example. With virtual desktops, I know where each focus is geometrically, and I can always swap over quickly with my key shortcuts.


  • the_sisko@startrek.websitetoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 years ago

    If you can’t remember or don’t know the syntax well you can still understand a systemd timer, but that is much hard for the crontab.

    I will agree that it is easier to read a timer than a Cron entry, especially if you’ve seen neither of them before.

    Granted, crontab uses fewer characters, but if you only set up either once in a blue moon you’ll need the docs to write either for a long time.

    This is where I disagree. I very rarely setup a Cron job, but when I do, I don’t need to look anywhere for docs. I run crontab -e and the first line of the editor contains a comment which annotates each column of the Cron entry (minute, hour, dom, mon, dow). All that’s left is to put in the matching expressions, and paste my command.

    Compare that to creating a new timer, where I need to Google a template .service and .timer file, and then figure out what to put in what fields from the docs. That’s probably available in the manual pages, but I don’t know which one. It’s just not worth it unless I need the extra power from systemd.

    This is from somebody who has several systemd timers and also a few Cron jobs. I’m not a hater, just a person choosing the best and easiest choice for the job.


  • the_sisko@startrek.websitetoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    2 years ago

    Cron may be old but I don’t think it’s “legacy” or invalid. There’s plenty of perfectly good, modern implementations. The interface is well established, and it’s quite simple to schedule something and check it. What’s more, Cron works on new Linux systems, older non-systemd ones, and BSD and others. If all you need is a command run on a schedule, then Cron is a great tool for the job.

    Systemd services and timers require you to read quite a bit more documentation to understand what you’re doing. But of course you get more power and flexibility as a result.