Hacker News new | past | comments | ask | show | jobs | submit | more kjqgqkejbfefn's comments login

Here's a study by someone from Kellogg university on this very subject:

https://www.jcr-admin.org/preprints/chernev-preprint.pdf

>Semantic Anchoring in Sequential Evaluations of Vices and Virtues

>How do people evaluate sequentially presented items? Prior research suggests that sequential evaluations are subject to anchoring biases, such that the values of subsequently evaluated alternatives are assimilated toward the initially considered option. The present research argues, however, that sequential valuations often lead to contrast rather than assimilation effects, whereby values of the subsequently estimated alternatives are distanced from the initially evaluated option. These contrast effects are attributed to semantic anchoring, which stems from evaluating conceptually related options classified into opposing categories (e.g., vices and virtues).


This is called

https://en.wikipedia.org/wiki/Anchoring_effect

More specifically, sequential anchoring:

>Consider the following scenario: A reviewer evaluates many unqualified applicants for a university program successively, and the next applicant to be reviewed is an average (borderline admit-reject) applicant. Because the evaluator is influenced, or anchored, by recently made decisions, this borderline applicant might be admitted to the program. On the other hand, when an evaluator is anchored by having reviewed many qualified applicants, the same or a similarly borderline applicant might be rejected (Figure 1). In this scenario, individual fairness, stating that individuals with similar characteristics should be treated similarly [8], is impaired, and wrong or inconsistent decisions can have a consequential impact.

Source: https://dl.acm.org/doi/fullHtml/10.1145/3491102.3517443


Disabling the GIL can unlock true multi-core parallelism for multi-threaded programs, but this requires code to be restructured for safe concurrency, which isn't that difficult it seems:

> When we found out about the “nogil” fork of Python it took a single person less than half a working day to adjust the codebase to use this fork and the results were astonishing. Now we can focus on data acquisition system development rather than fine-tuning data exchange algorithms.

https://peps.python.org/pep-0703/


>We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers.


Maybe they should look in to translating parts of their code base to Shedskin Python. It compiles (a subset of) Python to C++.


How's it different from Cython, which compiles a subset of Python to C or C++?


Shedskin has stricter typing, and about 10-100 times performance vs Cython.


>tree-based approach to organize and summarize text data, capturing both high-level and low-level details.

https://twitter.com/parthsarthi03/status/1753199233241674040

processes documents, organizing content and improving readability by handling sections, paragraphs, links, tables, lists, page continuations, and removing redundancies, watermarks, and applying OCR, with additional support for HTML and other formats through Apache Tika:

https://github.com/nlmatics/nlm-ingestor


I don't understand. Why build up text chunks from different, non-contiguous sections?


On the level of paper, not everything is laid out linearly. The main text is often laid out in column, the flow can be be offset with pictures with a caption, additional text can be placed in inserts, etc ...

You need a human eye to figure that out and this is the task nlm-ingestor tackles.

As for the content, semantic contiguity is not always guaranteed. A typical example of this are conversations, where people engage in narrative/argumentative competitions. Topics get nested as the conversation advances, along the lines of "Hey, this remind me of ...". Building up a stack that can be popped once subtopics have been exhausted: "To get back to the topic of ...".

This is explored at length by Kebrat-Orecchioni in:

https://www.cambridge.org/core/journals/language-in-society/...

And an explanation is offered by Dessalles in:

https://telecom-paris.hal.science/hal-03814068/document


If those non-contiguous sections share similar semantic/other meaning, it can make sense from a search perspective to group them?


it starts to look like a graph problem



Super interesting paper on an alternative way to render graphs. Thanks for posting!


Wow, this is really amazing, thank you


>Graph drawing tools

It's hard

Graphviz-like generic graph-drawing library. More options, more control.

https://eclipse.dev/elk/

Experiments by the same team responsible for the development of ELK, at Kiel University

https://github.com/kieler/KLighD

Kieler project wiki

https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...

Constraint-based graph drawing libraries

https://www.adaptagrams.org/

JS implementation

https://ialab.it.monash.edu/webcola/

Some cool stuff:

HOLA: Human-like Orthogonal Network Layout

https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf

Confluent Graphs demos: makes edges more readable.

https://www.aviz.fr/~bbach/confluentgraphs/

Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with Ports

https://arxiv.org/pdf/1408.4626.pdf

Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs

https://arxiv.org/pdf/1311.6996v1.pdf


Tech Ingredients is amazing



Speed up gains in development are not just a matter of how fast a coder can achieve a task compared to another worker, but how a certain choice of architecture can enable these gains to be collected again in the future for anyone working on this codebase. If you managed to make a certain task completable 10 times faster and this task will be repeated n times in the future (for instance implementing an API route on a server), you end up with a "complexity" of n versus 10n. Of course it requires making code investments and these decisions should be made by assessing whether it is worth it with respect to n (whether YAGNI). The true "10x" improvements are to be found in the compilers and frameworks you use, now the question is whether your 0.1x colleagues would be comfortable working with tools whose implementation overwhelms them but still allow them to get a 10x speedup.


Again, my understanding is that the 10x study was done on reviewed devs writing necessary and best-practice code in a cooperative setting with the same tooling, so this is still irrelevant to discussing the topic. Of course no one who wants to encourage velocity thinks you shouldn't invest in tooling that will automate long tasks for you. We should learn and cheat. My irritation is with suggestions that there's nothing serious to learn, not that I think there's nothing to cheat on, which is an unconnected topic anyway.


https://explorer.globe.engineer/?q=robot+kinematics is pretty neat. Would love links to actual content


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: