I haven't been able to try out the OP's link yet (I'm also on mobile right now), but for your current usage of splitting formulas across lines, I've used this tool a bunch to do that for me: https://www.excelformulabeautifier.com
Neat! I think I've done a similar thing in Jujutsu VCS, which enables you to start a new commit and add a message (description) to it well before you make any actual changes. As you described, it's a really useful way of keeping on track.
You could open source it and continue to work on it as a side project and main thing in your portfolio to demonstrate your skills.
And over time you could even offer a hosted/paid SaaS option, if there seems be interest in that and you have more time and resources available to sustain that.
From the perspective that you'd refine this project over time (as you already have), I don't think you should worry too much about how the code and architecture look and are right now or people's reactions to their current state. It'll grow, change, and improve as you do. And others' reactions can help you grow.
> I spent the last 6 months rewriting the architecture for multi-tenancy, overhauling the UI/UX, and adding productivity and compliance algorithms.
Separately, I think it could be really good for you and your career to document somewhere online (even just a blog) some of your thinking/decisions regarding these spaces. And it doesn't need to be too formal, just lay out your thinking.
Your post here is pretty great as an example of your communication skills - which I've heard are highly appreciated/valued in CS -, so you've got the skills, and it'd be great for you to have more public proof of that.
Thank you for your feedback and I would love your take on something, does it make sense to try and do that blog series to for example build a reputation for an agency or a company?
Like we at ACME changed X on Y because 123 and so on.
You can still relate it to concerns/business requirements or processes that agencies might have, which you considered and factored in when you made design decisions about the code, architecture, or processes. But since this is more of a project that you built on your own and one that you own, it makes more sense to me that you'd present this as "I changed X on Y because 123" and so on.
I meant trying to use it to produce content to kick start my own agency, even if it's just me in the beginning, to try and get some reputation or credibility.
I've suspected that, too, while looking at DAWs and how people make music with them. It seems a bit boring to me.
To kinda get away from that or even just experiment, I was interested in the possibility of writing music with code and either inject randomness in places or at least vary the intervals between beats/etc using some other functions (which would again just be layering in patterns, but in a more subtle way).
> Just as a simple example, I have a once-in-a-while newsletter+blog on a niche topic, and I could get way more eyeballs if I'd just rephrase things as a Reddit post, but I'm nostalgic about it living its own life on the Free-ish Web. Or, I suppose, this comment right here, which could just as well be on a personal blog with a "backlink" to yours.
I've been thinking lately about still posting things to various places like here and Reddit but compiling them later on and posting them to my website (likely with a link to where I posted them originally). That seems like a good middle ground for me and would enable me to build up a decent resource for myself and others, if I'm up for cleaning up the texts to provide or remove context as necessary.
Much of this idea of mine is from a desire to archive and pull together more of the stuff I've put effort into that's spread all over the web.
May I ask what techniques either you're using or would recommend for similarity clustering? I looked into topic modeling, but it seemed a long way off from reliably bundling together stories like on Techmeme.
(I'm working on basic blog and video aggregators like Planet Python.)
For similarity it is important to consider the dimensionality of your embeddings. The larger the text you wish to compare the bigger each embedding should be (to my limited understanding).
So a paragraph might be good as a 384-dim vector but if you have 1,000 words then you might want a 768-dim embedding (if not higher). Embedding models have slightly better/worse accuracy based on the training data they're fed, but higher dimensionality definitely gives better results - to a great extent. If you have an extensively long piece of text, it's easier to chunk it into pieces and create separate embeddings. You do have to manually stitch them back together and do some cleanup when displaying results but it works.
Once you have embeddings for all your data the rest is just cosine similarity, play around with the min_similarity. You will need to build good indexes on postgres but it is basically all you need.
reply