masto.es es uno de los varios servidores independientes de Mastodon que puedes usar para participar en el fediverso.
Bienvenidos a masto.es, el mayor servidor de Mastodon para hispanohablantes de temática general. Registros limitados actualmente a invitaciones.

Administrado por:

Estadísticas del servidor:

2 K
usuarios activos

#datascience

39 publicaciones36 participantes0 publicaciones hoy

I got some unfortunate news earlier this month: #UniversityOfArizona has decided to defund the group I work for, @cct-datascience.bsky.social, amidst their continuing budget crisis.

datascience.cct.arizona.edu/ne

It super sucks and means I'll likely have to drop down to 50% in May and I'll be looking for new, better supported, #rseng or #datascience positions, ideally still working with biologists in #rstats in some capacity. Remote or in #Tucson.

Let me know if you know of anything!

Data Science Team · Data Science Team seeking new projects after losing funding

I have previously mentioned software standards in passing.

The top-level standard is ISO/IEC Std 12207, Information Technology—Software Life Cycle Processes, which is the international standard that defines a life-cycle framework for developing and managing (ALL) software projects.

This standard was adopted in the United States as IEEE/EIA Std 12207, Information Technology—Software Life Cycle Processes.

Obviously, there are more

Tags: #ai #python #datascience #tech #linux #opensource

No robust solution in sight. #LLM progress is stagnant?

“According to #OpenAI’s internal benchmarks, their newer models– o3 and o4 mini– hallucinate more often than older reasoning models like o1, o1-mini, and o3-mini, as well as traditional models such as GPT-4”

#AI #tech #technology #datascience

theleftshift.com/openai-admits

The Left Shift · OpenAI Admits Newer Models Hallucinate Even MoreIn a technical report, the company said “more research is needed” to explain why hallucinations increase as reasoning capabilities scale

"This paper advances the critical analysis of machine learning by placing it in direct relation with actuarial science as a way to further draw out their shared epistemic politics. The social studies of machine learning—along with work focused on other broad forms of algorithmic assessment, prediction, and scoring—tends to emphasize features of these systems that are decidedly actuarial in nature, and even deeply actuarial in origin. Yet, those technologies are almost never framed as actuarial and then fleshed out in that context or with that connection. Through discussions of the production of ground truth and politics of risk governance, I zero in on the bedrock relations of power-value-knowledge that are fundamental to, and constructed by, these technosciences and their regimes of authority and veracity in society. Analyzing both machine learning and actuarial science in the same frame gives us a unique vantage for understanding and grounding these technologies of governance. I conclude this theoretical analysis by arguing that contrary to their careful public performances of mechanical objectivity these technosciences are postmodern in their practices and politics."

journals.sagepub.com/doi/10.11

Can AI Be Trained on Data Generated by Other AI? Exploring the Potential and Pitfalls of Synthetic Training Data
AI-generated training data is revolutionizing AI model training! Synthetic data simulates real-world scenarios, offering a more efficient approach. Companies like Anthropic are already using it. Learn more about this exciting new frontier! #SyntheticData #AIGeneration #AItraining #DataScience #MachineLearning #FutureofAI
tech-champion.com/data-science...

Understand RAG at Easter? 🐣 Why not use the time to learn something new — and build your own local PDF chatbot?

Learn how chunking, embeddings and vector search work in practice - with LangChain, FAISS, Ollama and Mistral running entirely on your machine (no API key required).

Perfect for beginners - here's the full guide & GitHub repo 👇

:blobcoffee: step-by-step guide: bit.ly/3EfOHB9
:blobcoffee: GitHub Repo: bit.ly/3EtqYgK

Any #DataScience people here?

I have a huge #BeaconDB dataset here, recorded with #NeoStumbler.

I would like to play around with it a bit, display densities of radio devices on a map.

I know #QGis, is it reasonably easy to import a CSV file there, assign the coordinates to some columns etc?

Alternatively I know a bit of #R, but #RStudio is #Electron now, so that could get a hassle XD

Btw, there is a #Fedora #COPR for R-Cran packages, how is the situation on #NixOS?