>Semantic Anchoring in Sequential Evaluations of Vices and Virtues
>How do people evaluate sequentially presented items? Prior research suggests that sequential evaluations are subject to anchoring biases, such that the values of subsequently evaluated alternatives are assimilated toward the initially considered option. The present research argues, however, that sequential valuations often lead to contrast rather than assimilation effects, whereby values of the subsequently estimated alternatives are distanced from the initially evaluated option. These contrast effects are attributed to semantic anchoring, which stems from evaluating conceptually related options classified into opposing categories (e.g., vices and virtues).
>Consider the following scenario: A reviewer evaluates many unqualified applicants for a university program successively, and the next applicant to be reviewed is an average (borderline admit-reject) applicant. Because the evaluator is influenced, or anchored, by recently made decisions, this borderline applicant might be admitted to the program. On the other hand, when an evaluator is anchored by having reviewed many qualified applicants, the same or a similarly borderline applicant might be rejected (Figure 1). In this scenario, individual fairness, stating that individuals with similar characteristics should be treated similarly [8], is impaired, and wrong or inconsistent decisions can have a consequential impact.
Disabling the GIL can unlock true multi-core parallelism for multi-threaded programs, but this requires code to be restructured for safe concurrency, which isn't that difficult it seems:
> When we found out about the “nogil” fork of Python it took a single person less than half a working day to adjust the codebase to use this fork and the results were astonishing. Now we can focus on data acquisition system development rather than fine-tuning data exchange algorithms.
>We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers.
processes documents, organizing content and improving readability by handling sections, paragraphs, links, tables, lists, page continuations, and removing redundancies, watermarks, and applying OCR, with additional support for HTML and other formats through Apache Tika:
On the level of paper, not everything is laid out linearly. The main text is often laid out in column, the flow can be be offset with pictures with a caption, additional text can be placed in inserts, etc ...
You need a human eye to figure that out and this is the task nlm-ingestor tackles.
As for the content, semantic contiguity is not always guaranteed. A typical example of this are conversations, where people engage in narrative/argumentative competitions. Topics get nested as the conversation advances, along the lines of "Hey, this remind me of ...". Building up a stack that can be popped once subtopics have been exhausted: "To get back to the topic of ...".
This is explored at length by Kebrat-Orecchioni in:
Speed up gains in development are not just a matter of how fast a coder can achieve a task compared to another worker, but how a certain choice of architecture can enable these gains to be collected again in the future for anyone working on this codebase. If you managed to make a certain task completable 10 times faster and this task will be repeated n times in the future (for instance implementing an API route on a server), you end up with a "complexity" of n versus 10n. Of course it requires making code investments and these decisions should be made by assessing whether it is worth it with respect to n (whether YAGNI). The true "10x" improvements are to be found in the compilers and frameworks you use, now the question is whether your 0.1x colleagues would be comfortable working with tools whose implementation overwhelms them but still allow them to get a 10x speedup.
Again, my understanding is that the 10x study was done on reviewed devs writing necessary and best-practice code in a cooperative setting with the same tooling, so this is still irrelevant to discussing the topic. Of course no one who wants to encourage velocity thinks you shouldn't invest in tooling that will automate long tasks for you. We should learn and cheat. My irritation is with suggestions that there's nothing serious to learn, not that I think there's nothing to cheat on, which is an unconnected topic anyway.
https://www.jcr-admin.org/preprints/chernev-preprint.pdf
>Semantic Anchoring in Sequential Evaluations of Vices and Virtues
>How do people evaluate sequentially presented items? Prior research suggests that sequential evaluations are subject to anchoring biases, such that the values of subsequently evaluated alternatives are assimilated toward the initially considered option. The present research argues, however, that sequential valuations often lead to contrast rather than assimilation effects, whereby values of the subsequently estimated alternatives are distanced from the initially evaluated option. These contrast effects are attributed to semantic anchoring, which stems from evaluating conceptually related options classified into opposing categories (e.g., vices and virtues).