More

ebfe1 · on Aug 27, 2024

More stats - https://play.clickhouse.com/play?user=play#c2VsZWN0IGFjdG9yX...

There were 18 accounts involved - 14 of them are now deleted/deactivated and 4 of them are still active(may have been compromised account)

It seems github did take action and these comments are disappearing :)

ebfe1 · on Aug 27, 2024

It seems github spambots are at it again... I'm surprised all these automations are not caught by github.. it is quite obvious something is up just looking at the time these issue comments are made in clickhouse playground database...

ebfe1 · on March 29, 2024

clickhouse has pretty good github_events dataset on playground that folks can use to do some research - some info on the dataset https://ghe.clickhouse.tech/

Example of what this user JiaT75 did so far:

https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...

pull requests mentioning xz, 5.6 without downgrade, cve being mentioned in the last 60 days:

https://play.clickhouse.com/play?user=play#U0VMRUNUIGNyZWF0Z...

palijer · on March 30, 2024

Yeah. It would be interesting to see who adopted to the compromised versions and how quickly, compared to how quickly they normally adopt new versions (not bots pulling upgrades, but how quickly maintainers approve and merge them)

If there were a bunch of people who adopted it abnormally fast compared to usual, might point to there being more "bad actors" in this operation (said at the risk of sounding paranoid if this turns out to be a state run thing)

ebfe1 · on Jan 5, 2024

Sad to see so many people getting scam these days, One of the idea I have and wish someone could take the time to implement is a self-registering platform where you can declare your information was previously stolen and used in a scam - The system will hash this information in multiple way, without storing them, such that, banks, financial institutions, or even mobilephone providers (think sim swap attack) can submit some users information and said system would come back with a result based on some matching of hashes(eg first name, dob, address or last name, dob, social security number). Ideally, it would result in the banks doing more vigorous check on user's identity like actually seeing them in person if this check fails rather than taking everything submitted over some web-form as is.

I have had family member who had their identity stolen for many years and it kept on going, it's super frustrating.

ebfe1 · on Jan 2, 2024

One idea: You can play with the github dataset in ClickHouse playground.

This is just a quick sql query to look for the DOI number pattern mentioned in any comment on repositories:

https://play.clickhouse.com/play?user=play#c2VsZWN0IHJlcG9fb...

``` select repo_name,event_type,body from github_events where event_type in ('IssueCommentEvent','IssuesEvent','PullRequestEvent','PullRequestReviewCommentEvent') and match (body,'.10\.\d{4,9}\/[-\._;()\/:A-Z0-9]+.') limit 10 ```

perhaps you can extend on that :)

ebfe1 · on Dec 2, 2023

Lovely! I only learnt today that clickhouse has a git-import tool from my colleagues at ClickHouse. So if you also want to give it a go:

Download clickhouse: curl https://clickhouse.com/ | sh

Check out documentation for git-import: ./clickhouse git-import --help

Then the tool can be run directly inside the git repository. It will collect data like commits, file changes and changes of every line in every file for further analysis. It works well even on largest repositories like Linux or Chromium.

Example of a trivial query:

SELECT author AS k, count() AS c FROM line_changes WHERE file_extension IN ('h', 'cpp') GROUP BY k ORDER BY c DESC LIMIT 20

Example of some non-trivial query - a matrix of authors, how much code of one author is removed by another:

SELECT k, written_code.c, removed_code.c, round(removed_code.c * 100 / written_code.c) AS remove_ratio FROM ( SELECT author AS k, count() AS c FROM line_changes WHERE sign = 1 AND file_extension IN ('h', 'cpp') AND line_type NOT IN ('Punct', 'Empty') GROUP BY k ) AS written_code INNER JOIN ( SELECT prev_author AS k, count() AS c FROM line_changes WHERE sign = -1 AND file_extension IN ('h', 'cpp') AND line_type NOT IN ('Punct', 'Empty') AND author != prev_author GROUP BY k ) AS removed_code USING (k) WHERE written_code.c > 1000 ORDER BY c DESC LIMIT 500

koolba · on Dec 2, 2023

> Download clickhouse: curl https://clickhouse.com/ | sh

Does this check the useragent to change the response? Clicking that link shows their home page.

ebfe1 · on Dec 2, 2023

that is exaxtly what it does ;) if you don't feel comfortable with curl | sh , you can download clickhouse binary from the repo here https://github.com/ClickHouse/ClickHouse/releases

;)

pcthrowaway · on Dec 3, 2023

Changing the content from an html page to a shell script based on user-agent is a pretty bad abuse of HTTP. Why not at least require `-H 'Accept: text/x-shellscript'`? Or be more basic and give the script its own URL

SomaticPirate · on Dec 3, 2023

Based on what reasoning? (Honestly curious)

remram · on Dec 3, 2023

If I want to download your homepage with curl to read offline, I get a script? If I use a tool you don't know you get the installer, I execute HTML?

If I run curl on Windows, do I get this script? A PowerShell version?

Why not make it https://clickhouse.com/linux-installer?

ebfe1 · on Dec 3, 2023

These are totally legit concerns, while the behaviour of the site has been around for quite sometimes and many ClickHouse installation script may have them so we will keep it for backward compatibility, we will add the usual install.sh url later and start sharing them more often.

(Pull request is in ... it should be deployed on Monday and you can use https://clickhouse.com/install.sh ). Love the feedbacks, please keep them coming!

never_inline · on Dec 3, 2023

Because someone may want to preview the script in browser.

Because someone may not have curl and use another tool your server doesn't know.

dotancohen · on Dec 3, 2023

To what Resourse does this URL (universal Resourse locator) refer? A web page or a script?

ebfe1 · on Nov 3, 2023

Curious - how many row are you ingesting daily?

This would be a really nice tool to reference to when people share their findings about malicious python packages being uploaded to pypi... take this one as example https://clickpy.clickhouse.com/dashboard/cobo-python-api

After looking through some of these malicious packages using your tool, I noticed a trend.. The malicious package usually targets high download count packages and create something similar so the moment they are uploaded, they start getting high number of download... perhaps it can be used as indicator.

gingerwizard · on Nov 3, 2023

Its about a billion rows a day. Nice idea, we could probably add a visual for possible malicious packages.

ebfe1 · on Sept 7, 2023

Another theory is they have an efficient way and or bypass facebook ratelimit to bruteforce reset victim's password token ... regardless, i would make sure 2fa is enabled for extra precaution... or maybe just take a break from facebook :)

ebfe1 · on March 28, 2023

100% this is what I experienced .. I was fine for months after taking the supplement then I went away for holiday, forgot my supplement and the leg cramps at night came back after sometime. When i came home, i started taking it again and the cramps were gone. Just from that experience, it confirmed to me that magnesium deficiency was the root cause.

ebfe1 · on March 28, 2023

From what I know nuts are rich in magnesium (Almonds and cashews) if that helps.