i made this — Abe Flansburg

I used to build a lot of interesting and niche things. Even my music I consider to be pretty niche / unique. There are definitely times I’ve created a track and realized it was derivative and archived it.

I think that’s one of the curses of listening to so many genres and playing so many different instruments and styles over the years. You begin to replicate what you’ve heard. I was listening to a lot of music from the middle east for a while, and then fast forward a few months and the Persian scale is surfacing randomly throughout my compositions.

Creating

I firmly believe we inherited the ability to create new and wonderful things from our Creator. I think our motivation has stunted that. I like to use religion as an example. Institutions abound that purport to ‘show you the way’ and many have vast resources to help humanity, but many are just social clubs with financials.

The US is a major hub for R&D of all types, but our economic motivators almost guarantee a stunted or warped output.

Personally, I wanted to live in the Star Trek timeline where the Vulcans notice our post World War 3 attempts at faster than light travel and come to engage us and enlighten us.

Deriving

Everything is derived really. Cars were earlier known as horseless carriages for a reason.

I hear newer songs and recognize riffs and turns from older songs that are almost verbatim, and no they’re not sampled… That reminds me randomly of the people that sit around and just wait to sue on patents or copyright infringement. That’s the environment we operate in here in the US of A.

Large Language Models are the ultimate tool to guarantee status quo. We were training ML models at a company I worked at back in 2015 and the whole point is to build this thing that can predict something. There’s a lot of work that goes in up front:

data ingestion, preprocessing
exploratory data analysis, statistics
feature engineering - i.e. what things matter in this data and what do/might they indicate
finding the right training method(s) / ensembles
training and validating the model, improving the performance metrics
deploying and serving for inference at scale

LLMs aren’t too different at all really. Lots of data went in and within the weights these patterns do exist that are reconstructed after the token is selected from the probablity distribution.

The pattern was already there

I asked Claude just now to give me an example of Composition vs Inheritance in Python and it gave me the following. It was not hard to find examples almost identical to this on the web.

# Inheritance approach (fragile)
class Animal:
    def breathe(self): ...

class FlyingAnimal(Animal):
    def fly(self): ...

class SwimmingAnimal(Animal):
    def swim(self): ...

# What about a duck? Multiple inheritance gets messy fast.
class Duck(FlyingAnimal, SwimmingAnimal): ...

# Composition approach (flexible)
class Flyer:
    def fly(self): return "flying"

class Swimmer:
    def swim(self): return "swimming"

class Quacker:
    def quack(self): return "quack"


class Duck:
    def __init__(self):
        self.flyer = Flyer()
        self.swimmer = Swimmer()
        self.quacker = Quacker()

    def fly(self): return self.flyer.fly()
    def swim(self): return self.swimmer.swim()
    def quack(self): return self.quacker.quack()


class Penguin:
    def __init__(self):
        self.swimmer = Swimmer()  # reuse, no fly

    def swim(self): return self.swimmer.swim()

Think about it in terms of Legos

I use LLMs a bit differently from many I think. I don’t think I’m alone or unique in this regard, but I think there are fewer of us operating this way. I look at LLMs as a giant Rubbermaid container of Legos, just like my son (and I at his age) have.

When I’m doing analysis on data, I already know many of the tools and methodologies to dig into the information and surface insights. Since I’m more of an engineer, and not so much a data scientist (although I have done the studies and held interim DS roles), using an LLM helps me quickly build a reasonable Juptyer notebook to explore and discover.

My queries are very much “what about __” or “what if we try ___” or “I know that the signal in this dataset is sparse so why don’t we look at some playbooks that are good at dealing with sparse information.”

The only gotcha here is that we are all asking the same / similar questions and getting essentially the same answers. Sure you can tweak parameters like temperature or top-k and get some interesting, maybe not necessarily novel or good, variety, but it’s still not novel.

If you make something, really make it. Put in the time to at least look at what you’ve generated or put together with help from LLMs and say “is this novel” or “is this even good”? When in doubt, let an actual human expert review for novelty.

Look what I made.

Did you?