More

psimm · on Nov 30, 2023

Demos look good! Could you please explain the advantages of Taipy over Streamlit and Shiny for Python?

Alyx1337 · on Nov 30, 2023

We actually used Streamlit in the past. Our gripe with it was how the backend event loop was managed. Basically, Streamlit re-runs your code at every user interaction to check what's changed (unless you cache specific variables which is hard to do well). When your app has significant data or a significant model to work with or multiple pages or users, this approach fails, and the app starts freezing constantly. We wanted a product that is the compromise between the easy learning curve of Streamlit while retaining production-ready capabilities: we use callbacks for user interactions to avoid unnecessary computations, front and back-end are running on separate threads. We also run on Jupyter notebooks if that helps.

d4rkp4ttern · on Nov 30, 2023

The script re-run ( and the bandaid of caching via decorators) is exactly what I don’t like about streamlit. I’d love to see an example of how you use Taipy to build an LLM chat app, analogous to this SL example:

https://docs.streamlit.io/knowledge-base/tutorials/build-con...

Then I’ll give it a shot

d4rkp4ttern · on Dec 2, 2023

Another interest one in this space — Reflex (formerly known as PyneCone). They have a ready to use LLM chat App, which makes it more likely I will check it out.

https://github.com/reflex-dev/reflex

paddy_m · on Nov 30, 2023

I would love to read a comparison explaining the relative advantages of each framework from an experienced practitioner who has actually built apps with each. Add in plotly dash, bokeh/panel, and voila too.

Off the top of my head, bokeh and panel are more oriented towards high performance for large datasets, but have less overall adoption.

Voila is oriented towards making existing jupyter notebooks into interactive dashboards.

I'm always curious as to the runtime model for these interactive frameworks. Building interactivity into a jupyter notebook is fairly straightforward, but it's a very different execution model than the traditional http model. Jupyter notebook widgets need a separate backing kernel for each new user, vs the traditional http server models where all request state is reified normally based on a session cookie to DB state. The complete interpreter per user makes for simpler programming, but it is much more memory and process intensive.

paddy_m · on Nov 30, 2023

Shiny has a DAG that enables intelligent caching. Here is a good talk explaining it https://www.youtube.com/watch?v=YNCPc9aWm_8

psimm · on Aug 15, 2023

I'm working on a Python package for this: https://github.com/qagentur/texttunnel

It's a wrapper for OpenAI's Python sample script plus adjacent functionality like cost estimation and binpacking multiple inputs into one request.

nantersand · on Aug 15, 2023

Will check it out!

psimm · on Jan 8, 2023

I feel the same. The closest to tidyverse in Python I've seen is siuba, a neat wrapper around pandas. Tidypolars is great too.

Lately, I've used DuckDB to write SQL that manipulates pandas data frames.

psimm · on Jan 17, 2022

This article compares dplyr syntax with pandas, siuba, polars, ibis and duckdb: https://simmering.dev/blog/dataframes/

As other have said, escaping pandas is hard. Many visualization and data manipulation, validation and analysis libraries expect pandas input.

Siuba is really cool in that it offers a convenient syntax on top of pandas (and SQL databases) without requiring its own data format.

psimm · on Jan 10, 2022

Very cool! I signed up and uploaded data for a text classifier. 3000 examples of social media posts on a binary annotation task. Got 91% initially, then looked through the annotations and corrected a few errors that had snuck in. The UI for that is great. That got it to 92%.

Easy to use UI, easy data upload and the training was quick. A great tool for testing new ideas for classifiers. For bigger projects I'd be concerned about long term cost with pay per invocation.

Is weak labeling via labeling functions (snorkel, skweak) something that's on the roadmap for Nyckel? Also, do you plan to add named entity recognition?

saintarian · on Jan 10, 2022

Thanks you for the kind words and feedback! You basically went through most of the UI flow that we designed for. You're spot-on about testing new classifiers - answering the question "Can ML even help with my problem?" is much easier with Nyckel and prototyping and rapid iteration starts with that.

Our goal is to be cost-competitive, even for bigger projects. Given how early we are, our pricing structure is still being worked on, especially for high-volume.

Integrating with labeling solutions is in our roadmap. In the meantime, our API should enable any data/labeling integrations.

Named entity recognition is also in the roadmap. Would love to hear more about your use-case and we can give you access to the beta when ready.

beijbom · on Jan 10, 2022

Chiming in on the weak labeling question: As of right now, you can use outside libraries like skweak to create weak labels offline and then PUT those using our API (https://www.nyckel.com/docs#update-annotation). This wouldn't cost anything since we only charge for invokes, but it requires some work.

We may look at adding weak labeling as a first class feature of our site down the road, but we are not yet sure we need to. With the powerful semantic representations offered by the latest deep nets, we find that smaller number of hand-annotated samples often suffice for the desired accuracy which makes the whole annotation process simpler and faster. Of course, if you have data & evidence to the contrary, we'd love to take a look.

psimm · on Jan 9, 2022

It's a cool challenge! I tried it at 90wpm and cleared most words. Some I had to do 2 or 3 times to do fast enough. Then I hit my nemesis: I can't type "necessary" fast enough for 90wpm. Tried it 20 times.

kuehle · on Jan 9, 2022

Great point, while building it I also encountered words I couldn't type.

1. That was very enlightening

2. I added skipping a word by using the [Esc] key

It would be interesting to see if one could safe those words for later and practise just those.

psimm · on Jan 9, 2022

Agree, I didn't understand that one can change the WPM before reading this thread.

kuehle · on Jan 9, 2022

Good point, I've added a help button in the top left corner that briefly explains the concept. Maybe that already helps a bit.

cocoflunchy · on Jan 10, 2022

it helps, but I was very confused at first too (the button is not very visible)

psimm · on Aug 3, 2021

Q Insight Agency | Data Scientist | Remote or Mannheim, Germany

Q Insight Agency is a market research agency. We help our clients in consumer goods and pharma understand their customers. Our background is in qualitative research (interviews, focus groups, workshops) and now we are expanding into data science with a focus on social media. As a new team in an established company, the data science team enjoys stability and resources but also has the freedom to build.

We just launched Cosmention, an AI-powered social media monitoring tool specialized for cosmetics. It analyzes millions of social media posts from all platforms and detects mentions of brands, products, ingredients and other entities. Our stack: R, Python, Shiny, AWS, Snowflake, Docker.

Learn more at: https://teamq.de/blog/103/datascientist21

psimm · on June 21, 2021

The UI looks great! As others have said, I like how compact it is. It doesn't get in the way as much as MS Teams and Discord do. I also like that it is lightweight. It's important to me that the app stays performant while sharing the screen. MS Teams is too laggy.

Have you looked at Tuple? Noor seems quite similar. Could you please explain a bit more about the differences between Noor and Tuple?

Finally, as others have said: lack of Windows compatibility is a dealbreaker for me for now. A performant Windows app would be fantastic. That's also something that Tuple doesn't have.

psimm · on June 19, 2021

I'm in the market for a tool like this. At the moment I'm using Prodigy but interested in other options. Features that I'd be willing to pay for (or rather my employer):

  1 team functionality with multiple user accounts

  2 easy to use workflow for double annotation where each text is annotated by exactly two annotators. The software should make sure that a text is never shown to more than 2 annotators and never shown to the same annotator twice

  3 make it easy to review the 2 versions and solve conflicts

  4 smarter alternative to review would be a warning system that identifies annotations that may have errors (because a model trained on the other data predicts a different result) and automatically flags it for review by another annotator

  5 stats on the annotators: speed, accuracy, statistics on how frequently they assign different labels to detect potential misunderstandings of the annotation schema

  6 GUI with overview of all annotation datasets, with stats like % finished annotating (with stages for double annotation and review), the types of annotation done, frequencies of labels to detect imbalances

 7 functions to mass-edit the annotations, like renaming or removing an entity type

Another thing I'd be interested in is some integration with a third party annotation provider. There are companies that offer annotation as a service and it's also available on Google Cloud and AWS. Having that integrated into an annotation tool would make it very easy to get large amounts of well annotated training material.

But finally, and much more importantly: The workflow for annotators has to be perfected first, so they can work as efficiently and consistently as possible. Getting this right is more important to me than any of the other features I listed.

neiman1 · on June 19, 2021

I appreciate the insight, that's super helpful.

> team functionality with multiple user accounts

Mind if I ask what sort of team features you make use of with Prodigy? Are there any aspects you feel are lacking? Initial thoughts are that it'd be helpful for teams to be able to set group annotation goals, share docs / annotations / configs, view ongoing sessions, assign annotators to sessions, and view stats on each annotator (as per point 5).

> The software should make sure that a text is never shown to more than 2 annotators and never shown to the same annotator twice

For this I plan to let teams set the threshold for the number of documents that should overlap and the number of annotators a text should be shown to. In some situations it could be useful for there to be some % of overlap for all annotators to help determine the inter-annotator agreement across the entire team.

> The workflow for annotators has to be perfected first

Totally agree. My biggest concern is building out the above on top of an inefficient workflow. That's one of the primary driving forces behind the current re-write of the tool.

Love the smart flagging, mass-edit, and integrated provider ideas!

psimm · on June 21, 2021

I use these team features in Prodigy: I start annotation sessions with different session_id and with the feed_overlap flag. I run Prodigy from an EC2 instance that annotators connect to.

The Prodigy team is working on a new version called Prodigy Scale with more team features. I'm looking forward to that release! For now it feels like a hack to use Prodigy in a team.

Inter-annotator agreement is key! You could consider making that highly visible in your tool. It's something that every team should measure and strive to maximize.

For developers who use spaCy in production (like me), I imagine it would be very hard for your tool to come out on top of Prodigy. But there could be an opportunity with price-sensitive hobby users or devs who use a different NLP library.