Full-Stack & Ops
- 1.
- 2.
- 3.
- 4.
- 5.Github Actions
- 1.Pypi - public
- 2.
- AIOps - “AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies.
- Containerize your ds environment using docker compose - Docker-Compose is simply a tool that allows you to describe a collection of multiple containers that can interact via their own network in a very straight forward way,
- Builds docker images out of gits
- 1.
- Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
- Runs in k8s
- Seldon-core seldon-deploy (what are the differences?)
- Apache ActiveMQ™ is the most popular open source, multi-protocol, Java-based messaging serve
- Kafka in a nutshell - But even these solutions came up short in some cases. For example, RabbitMQ stores messages in DRAM until the DRAM is completely consumed, at which point messages are written to disk, severely impacting performance.
Also, the routing logic of AMQP can be fairly complicated as opposed to Apache Kafka. For instance, each consumer simply decides which messages to read in Kafka.
In addition to message routing simplicity, there are places where developers and DevOps staff prefer Apache Kafka for its high throughput, scalability, performance, and durability; although, developers still swear by all three systems for various reasons.
- Apache Kafka Kafka is a pub-sub messaging system. It uses Zookeeper to detect crashes, to implement topic discovery, and to maintain production and consumption state for topics.
- For python, Your code is telling you more than what your logs let on. Sentry’s full stack monitoring gives you full visibility into your code, so you can catch issues before they become downtime.
- 1.
- 2.
- 3.
- 4.Note: redis is a managed dictionary its strength lies when you have a lot of data that needs to be queries and managed and you don’t want to hard code it, for example.
- 5.
- 1.
- 4.
- 5.
- 6.
Dependency injection - based on SOLID the class should do one thing, so we are letting other classes create 3rd party/class objects for us instead of doing it internally, either by init passing or by injecting in runtime.
Plotly for jupyter lab “jupyter labextension install @jupyterlab/plotly-extension”
- 1.
- 2.Seldon
- 5.Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.
Dapr codifies the best practices for building microservice applications into open, independent, building blocks that enable you to build portable applications with the language and framework of your choice. Each building block is independent and you can use one, some, or all of them in your application.
- 2.Cnvrg.io -
- 1.Manage - Easily navigate machine learning with dashboards, reproducible data science, dataset organization, experiment tracking and visualization, a model repository and more
- 2.Build - Run and track experiments in hyperspeed with the freedom to use any compute environment, framework, programming language or tool - no configuration required
- 3.Automate - Build more models and automate your machine learning from research to production using reusable components and drag-n-drop interface
- 3.Comet.ml - Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects.
- 4.Floyd - notebooks on the cloud, similar to colab / kaggle, etc. gpu costs 4$/h
- 6.Missing link - RIP
- 7.Spark
- 8.Databricks
- 1.
- 2.Intro to DB on spark, has some basic sklearn-like tool and other custom operations such as single-vector-based aggregator for using features as an input to a model
- 3.
- 6.
- 7.
- 1.Spark sql
- 2.Mlflow
- 3.Streaming
- 4.SystemML DML using keras models.
- 10.
- 1.from spark_sklearn import GridSearchCV
- 11.How can we leverage our existing experience with modeling libraries like scikit-learn? We'll explore three approaches that make use of existing libraries, but still benefit from the parallelism provided by Spark.
These approaches are:
- Grid Search
- Cross Validation
- Sampling (random, chronological subsets of data across clusters)
- 1.
- 1.Ref: It's worth pausing here to note that the architecture of this approach is different than that used by MLlib in Spark. Using spark-sklearn, we're simply distributing the cross-validation run of each model (with a specific combination of hyperparameters) across each Spark executor. Spark MLlib, on the other hand, will distribute the internals of the actual learning algorithms across the cluster.
- 2.The main advantage of spark-sklearn is that it enables leveraging the very rich set of machine learning algorithms in scikit-learn. These algorithms do not run natively on a cluster (although they can be parallelized on a single machine) and by adding Spark, we can unlock a lot more horsepower than could ordinarily be used.
- 3.Using spark-sklearn is a straightforward way to throw more CPU at any machine learning problem you might have. We used the package to reduce the time spent searching and reduce the error for our estimator
- 4.
- 5.
- 1.
- 2.
- 1.
- 1.NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.
- 2.
- 3."One advantage of using NGINX as an API gateway is that it can perform that role while simultaneously acting as a reverse proxy, load balancer, and web server for existing HTTP traffic. If NGINX is already part of your application delivery stack then it is generally unnecessary to deploy a separate API gateway"
Last modified 3mo ago