diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -7,474 +7,729 @@ tags: - sentence-similarity - feature-extraction - generated_from_trainer -- dataset_size:36 +- dataset_size:3284 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: Snowflake/snowflake-arctic-embed-m-v1.5 widget: -- source_sentence: How do you configure the necessary RBAC resources in Kubernetes - to enable Spark access for managing driver executor pods, and what are the subsequent - steps needed to register the stack component using ZenML? +- source_sentence: How do I register and activate a stack with a new orchestrator + using ZenML? sentences: - - 'Google Cloud Image Builder + - "hestrator registered and part of our active stack:zenml orchestrator register\ + \ \\\n --flavor=airflow \\\n --local=True # set this\ + \ to `False` if using a remote Airflow deployment\n\n# Register and activate a\ + \ stack with the new orchestrator\nzenml stack register -o \ + \ ... --set\n\nDue to dependency conflicts, we need to install the Python packages\ + \ to start a local Airflow server in a separate Python environment.\n\n# Create\ + \ a fresh virtual environment in which we install the Airflow server dependencies\n\ + python -m venv airflow_server_environment\nsource airflow_server_environment/bin/activate\n\ + \n# Install the Airflow server dependencies\npip install \"apache-airflow==2.4.0\"\ + \ \"apache-airflow-providers-docker<3.8.0\" \"pydantic~=2.7.1\"\n\nBefore starting\ + \ the local Airflow server, we can set a few environment variables to configure\ + \ it:\n\nAIRFLOW_HOME: This variable defines the location where the Airflow server\ + \ stores its database and configuration files. The default value is ~/airflow.\n\ + \nAIRFLOW__CORE__DAGS_FOLDER: This variable defines the location where the Airflow\ + \ server looks for DAG files. The default value is /dags.\n\nAIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL:\ + \ This variable controls how often the Airflow scheduler checks for new or updated\ + \ DAGs. By default, the scheduler will check for new DAGs every 30 seconds. This\ + \ variable can be used to increase or decrease the frequency of the checks, depending\ + \ on the specific needs of your pipeline.\n\nWhen running this on MacOS, you might\ + \ need to set the no_proxy environment variable to prevent crashes due to a bug\ + \ in Airflow (see this page for more information):\n\nexport no_proxy=*\n\nWe\ + \ can now start the local Airflow server by running the following command:\n\n\ + # Switch to the Python environment that has Airflow installed before running this\ + \ command\nairflow standalone" + - "ta stores you want to migrate, then upgrade ZenML.Decide the ZenML deployment\ + \ model that you want to follow for your projects. See the ZenML deployment documentation\ + \ for available deployment scenarios. If you decide on using a local or remote\ + \ ZenML server to manage your pipelines, make sure that you first connect your\ + \ client to it by running zenml connect.\n\nUse the zenml pipeline runs migrate\ + \ CLI command to migrate your old pipeline runs:\n\nIf you want to migrate from\ + \ a local SQLite metadata store, you only need to pass the path to the metadata\ + \ store to the command, e.g.:\n\nzenml pipeline runs migrate PATH/TO/LOCAL/STORE/metadata.db\n\ + \nIf you would like to migrate any other store, you will need to set --database_type=mysql\ + \ and provide the MySQL host, username, and password in addition to the database,\ + \ e.g.:\n\nzenml pipeline runs migrate DATABASE_NAME \\\n --database_type=mysql\ + \ \\\n --mysql_host=URL/TO/MYSQL \\\n --mysql_username=MYSQL_USERNAME \\\n \ + \ --mysql_password=MYSQL_PASSWORD\n\n\U0001F4BE The New Way (CLI Command Cheat\ + \ Sheet)\n\nDeploy the server\n\nzenml deploy --aws (maybe don’t do this :) since\ + \ it spins up infrastructure on AWS…)\n\nSpin up a local ZenML Server\n\nzenml\ + \ up\n\nConnect to a pre-existing server\n\nzenml connect (pass in URL / etc,\ + \ or zenml connect --config + yaml file)\n\nList your deployed server details\n\ + \nzenml status\n\nThe ZenML Dashboard is now available\n\nThe new ZenML Dashboard\ + \ is now bundled into the ZenML Python package and can be launched directly from\ + \ Python. The source code lives in the ZenML Dashboard repository.\n\nTo launch\ + \ it locally, simply run zenml up on your machine and follow the instructions:\n\ + \n$ zenml up\nDeploying a local ZenML server with name 'local'.\nConnecting ZenML\ + \ to the 'local' local ZenML server (http://127.0.0.1:8237).\nUpdated the global\ + \ store configuration.\nConnected ZenML to the 'local' local ZenML server (http://127.0.0.1:8237).\n\ + The local ZenML dashboard is available at 'http://127.0.0.1:8237'. You can\nconnect\ + \ to it using the 'default' username and an empty password." + - '🐍Configure Python environments + + + Navigating multiple development environments. + + + PreviousHyperAI Service ConnectorNextHandling dependencies - Building container images with Google Cloud Build + Last updated 21 days ago' +- source_sentence: How do you build a simple machine learning pipeline using ZenML + decorators in the code? + sentences: + - 'Develop a custom data validator - The Google Cloud image builder is an image builder flavor provided by the ZenML - gcp integration that uses Google Cloud Build to build container images. + How to develop a custom data validator - When to use it + Before diving into the specifics of this component type, it is beneficial to familiarize + yourself with our general guide to writing custom component flavors in ZenML. + This guide provides an essential understanding of ZenML''s component flavor concepts. - You should use the Google Cloud image builder if: + Base abstraction in progress! - you''re unable to install or use Docker on your client machine. + We are actively working on the base abstraction for the Data Validators, which + will be available soon. As a result, their extension is not recommended at the + moment. When you are selecting a data validator for your stack, you can use one + of the existing flavors. - you''re already using GCP. + If you need to implement your own Data Validator flavor, you can still do so, + but keep in mind that you may have to refactor it when the base abstraction is + updated. - your stack is mainly composed of other Google Cloud components such as the GCS - Artifact Store or the Vertex Orchestrator. + ZenML comes equipped with Data Validator implementations that integrate a variety + of data logging and validation libraries, frameworks and platforms. However, if + you need to use a different library or service as a backend for your ZenML Data + Validator, you can extend ZenML to provide your own custom Data Validator implementation. - How to deploy it + Build your own custom data validator - Would you like to skip ahead and deploy a full ZenML cloud stack already, including - the Google Cloud image builder? Check out the in-browser stack deployment wizard, - the stack registration wizard, or the ZenML GCP Terraform module for a shortcut - on how to deploy & register this stack component. + If you want to implement your own custom Data Validator, you can follow the following + steps: - In order to use the ZenML Google Cloud image builder you need to enable Google - Cloud Build relevant APIs on the Google Cloud project. + Create a class which inherits from the BaseDataValidator class and override one + or more of the abstract methods, depending on the capabilities of the underlying + library/service that you want to integrate. - How to use it + If you need any configuration, you can create a class which inherits from the + BaseDataValidatorConfig class. - To use the Google Cloud image builder, we need: + Bring both of these classes together by inheriting from the BaseDataValidatorFlavor. - The ZenML gcp integration installed. If you haven''t done so, run: + (Optional) You should also provide some standard steps that others can easily + insert into their pipelines for instant access to data validation features. - zenml integration install gcp + Once you are done with the implementation, you can register it through the CLI. + Please ensure you point to the flavor class via dot notation:' + - " This is us if you want to put faces to the names!However, in order to improve\ + \ ZenML and understand how it is being used, we need to use analytics to have\ + \ an overview of how it is used 'in the wild'. This not only helps us find bugs\ + \ but also helps us prioritize features and commands that might be useful in future\ + \ releases. If we did not have this information, all we really get is pip download\ + \ statistics and chatting with people directly, which while being valuable, is\ + \ not enough to seriously better the tool as a whole.\n\nHow does ZenML collect\ + \ these statistics?\n\nWe use Segment as the data aggregation library for all\ + \ our analytics. However, before any events get sent to Segment, they first go\ + \ through a central ZenML analytics server. This added layer allows us to put\ + \ various countermeasures to incidents such as getting spammed with events and\ + \ enables us to have a more optimized tracking process.\n\nThe client code is\ + \ entirely visible and can be seen in the analytics module of our main repository.\n\ + \nIf I share my email, will you spam me?\n\nNo, we won't. Our sole purpose of\ + \ contacting you will be to ask for feedback (e.g. in the shape of a user interview).\ + \ These interviews help the core team understand usage better and prioritize feature\ + \ requests. If you have any concerns about data privacy and the usage of personal\ + \ information, please contact us, and we will try to alleviate any concerns as\ + \ soon as possible.\n\nVersion mismatch (downgrading)\n\nIf you've recently downgraded\ + \ your ZenML version to an earlier release or installed a newer version on a different\ + \ environment on the same machine, you might encounter an error message when running\ + \ ZenML that says:\n\n`The ZenML global configuration version (%s) is higher than\ + \ the version of ZenML \ncurrently being used (%s).`\n\nWe generally recommend\ + \ using the latest ZenML version. However, there might be cases where you need\ + \ to match the global configuration version with the version of ZenML installed\ + \ in the current environment. To do this, run the following command:\n\nzenml\ + \ downgrade" + - "⛓️Build a pipeline\n\nBuilding pipelines is as simple as adding the `@step` and\ + \ `@pipeline` decorators to your code.\n\n@step # Just add this decorator\ndef\ + \ load_data() -> dict:\n training_data = [[1, 2], [3, 4], [5, 6]]\n labels\ + \ = [0, 1, 0]\n return {'features': training_data, 'labels': labels}\n\n@step\n\ + def train_model(data: dict) -> None:\n total_features = sum(map(sum, data['features']))\n\ + \ total_labels = sum(data['labels'])\n\n# Train some model here\n\nprint(f\"\ + Trained model using {len(data['features'])} data points. \"\n f\"Feature\ + \ sum is {total_features}, label sum is {total_labels}\")\n\n@pipeline # This\ + \ function combines steps together \ndef simple_ml_pipeline():\n dataset =\ + \ load_data()\n train_model(dataset)\n\nYou can now run this pipeline by simply\ + \ calling the function:\n\nsimple_ml_pipeline()\n\nWhen this pipeline is executed,\ + \ the run of the pipeline gets logged to the ZenML dashboard where you can now\ + \ go to look at its DAG and all the associated metadata. To access the dashboard\ + \ you need to have a ZenML server either running locally or remotely. See our\ + \ documentation on this here.\n\nCheck below for more advanced ways to build and\ + \ interact with your pipeline.\n\nConfigure pipeline/step parameters\n\nName and\ + \ annotate step outputs\n\nControl caching behavior\n\nRun pipeline from a pipeline\n\ + \nControl the execution order of steps\n\nCustomize the step invocation ids\n\n\ + Name your pipeline runs\n\nUse failure/success hooks\n\nHyperparameter tuning\n\ + \nAttach metadata to steps\n\nFetch metadata within steps\n\nFetch metadata during\ + \ pipeline composition\n\nEnable or disable logs storing\n\nSpecial Metadata Types\n\ + \nAccess secrets in a step\n\nPreviousBest practicesNextUse pipeline/step parameters\n\ + \nLast updated 1 month ago" +- source_sentence: How can I integrate Large Language Models (LLMs) into my MLOps + workflows using ZenML? + sentences: + - '🦜LLMOps guide - A GCP Artifact Store where the build context will be uploaded, so Google Cloud - Build can access it. + Leverage the power of LLMs in your MLOps workflows with ZenML. - A GCP container registry where the built image will be pushed. + Welcome to the ZenML LLMOps Guide, where we dive into the exciting world of Large + Language Models (LLMs) and how to integrate them seamlessly into your MLOps pipelines + using ZenML. This guide is designed for ML practitioners and MLOps engineers looking + to harness the potential of LLMs while maintaining the robustness and scalability + of their workflows. - Optionally, the GCP project ID in which you want to run the build and a service - account with the needed permissions to run the build. If not provided, then the - project ID and credentials will be inferred from the environment. + In this guide, we''ll explore various aspects of working with LLMs in ZenML, including: - Optionally, you can change: + RAG with ZenML - the Docker image used by Google Cloud Build to execute the steps to build and - push the Docker image. By default, the builder image will be ''gcr.io/cloud-builders/docker''. + RAG in 85 lines of code - The network to which the container used to build the ZenML pipeline Docker image - will be attached. More information: Cloud build network. + Understanding Retrieval-Augmented Generation (RAG) - The build timeout for the build, and for the blocking operation waiting for the - build to finish. More information: Build Timeout.' - - "_run.steps[step_name]\n whylogs_step.visualize()if __name__ == \"__main__\"\ - :\n visualize_statistics(\"data_loader\")\n visualize_statistics(\"train_data_profiler\"\ - , \"test_data_profiler\")\n\nPreviousEvidentlyNextDevelop a custom data validator\n\ - \nLast updated 1 month ago" - - "ngs/python/Dockerfile -u 0 build\n\nConfiguring RBACAdditionally, you may need\ - \ to create the several resources in Kubernetes in order to give Spark access\ - \ to edit/manage your driver executor pods.\n\nTo do so, create a file called\ - \ rbac.yaml with the following content:\n\napiVersion: v1\nkind: Namespace\nmetadata:\n\ - \ name: spark-namespace\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n\ - \ name: spark-service-account\n namespace: spark-namespace\n---\napiVersion:\ - \ rbac.authorization.k8s.io/v1\nkind: ClusterRoleBinding\nmetadata:\n name: spark-role\n\ - \ namespace: spark-namespace\nsubjects:\n - kind: ServiceAccount\n name:\ - \ spark-service-account\n namespace: spark-namespace\nroleRef:\n kind: ClusterRole\n\ - \ name: edit\n apiGroup: rbac.authorization.k8s.io\n---\n\nAnd then execute\ - \ the following command to create the resources:\n\naws eks --region=$REGION update-kubeconfig\ - \ --name=$EKS_CLUSTER_NAME\n\nkubectl create -f rbac.yaml\n\nLastly, note down\ - \ the namespace and the name of the service account since you will need them when\ - \ registering the stack component in the next step.\n\nHow to use it\n\nTo use\ - \ the KubernetesSparkStepOperator, you need:\n\nthe ZenML spark integration. If\ - \ you haven't installed it already, run\n\nzenml integration install spark\n\n\ - Docker installed and running.\n\nA remote artifact store as part of your stack.\n\ - \nA remote container registry as part of your stack.\n\nA Kubernetes cluster deployed.\n\ - \nWe can then register the step operator and use it in our active stack:\n\nzenml\ - \ step-operator register spark_step_operator \\\n\t--flavor=spark-kubernetes \\\ - \n\t--master=k8s://$EKS_API_SERVER_ENDPOINT \\\n\t--namespace=\ - \ \\\n\t--service_account=\n\n# Register the\ - \ stack\nzenml stack register spark_stack \\\n -o default \\\n -s spark_step_operator\ - \ \\\n -a spark_artifact_store \\\n -c spark_container_registry \\\n \ - \ -i local_builder \\\n --set" -- source_sentence: What is the function of a ZenML BaseService registry in the context - of model deployment? - sentences: - - "\U0001F5C4️Handle Data/Artifacts\n\nStep outputs in ZenML are stored in the artifact\ - \ store. This enables caching, lineage and auditability. Using type annotations\ - \ helps with transparency, passing data between steps, and serializing/des\n\n\ - For best results, use type annotations for your outputs. This is good coding practice\ - \ for transparency, helps ZenML handle passing data between steps, and also enables\ - \ ZenML to serialize and deserialize (referred to as 'materialize' in ZenML) the\ - \ data.\n\n@step\ndef load_data(parameter: int) -> Dict[str, Any]:\n\n# do something\ - \ with the parameter here\n\ntraining_data = [[1, 2], [3, 4], [5, 6]]\n labels\ - \ = [0, 1, 0]\n return {'features': training_data, 'labels': labels}\n\n@step\n\ - def train_model(data: Dict[str, Any]) -> None:\n total_features = sum(map(sum,\ - \ data['features']))\n total_labels = sum(data['labels'])\n \n # Train\ - \ some model here\n \n print(f\"Trained model using {len(data['features'])}\ - \ data points. \"\n f\"Feature sum is {total_features}, label sum is\ - \ {total_labels}\")\n\n@pipeline \ndef simple_ml_pipeline(parameter: int):\n\ - \ dataset = load_data(parameter=parameter) # Get the output \n train_model(dataset)\ - \ # Pipe the previous step output into the downstream step\n\nIn this code, we\ - \ define two steps: load_data and train_model. The load_data step takes an integer\ - \ parameter and returns a dictionary containing training data and labels. The\ - \ train_model step receives the dictionary from load_data, extracts the features\ - \ and labels, and trains a model (not shown here).\n\nFinally, we define a pipeline\ - \ simple_ml_pipeline that chains the load_data and train_model steps together.\ - \ The output from load_data is passed as input to train_model, demonstrating how\ - \ data flows between steps in a ZenML pipeline.\n\nPreviousDisable colorful loggingNextHow\ - \ ZenML stores data\n\nLast updated 4 months ago" - - 'πŸ§™Installation + Data ingestion and preprocessing - Installing ZenML and getting started. + Embeddings generation - ZenML is a Python package that can be installed directly via pip: + Storing embeddings in a vector database - pip install zenml + Basic RAG inference pipeline - Note that ZenML currently supports Python 3.8, 3.9, 3.10, and 3.11. Please make - sure that you are using a supported Python version. + Evaluation and metrics - Install with the dashboard + Evaluation in 65 lines of code - ZenML comes bundled with a web dashboard that lives inside a sister repository. - In order to get access to the dashboard locally, you need to launch the ZenML - Server and Dashboard locally. For this, you need to install the optional dependencies - for the ZenML Server: + Retrieval evaluation - pip install "zenml[server]" + Generation evaluation - We highly encourage you to install ZenML in a virtual environment. At ZenML, We - like to use virtualenvwrapper or pyenv-virtualenv to manage our Python virtual - environments. + Evaluation in practice - Installing onto MacOS with Apple Silicon (M1, M2) + Reranking for better retrieval - A change in how forking works on Macs running on Apple Silicon means that you - should set the following environment variable which will ensure that your connections - to the server remain unbroken: + Understanding reranking - export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES + Implementing reranking in ZenML - You can read more about this here. This environment variable is needed if you - are working with a local server on your Mac, but if you''re just using ZenML as - a client / CLI and connecting to a deployed server then you don''t need to set - it. + Evaluating reranking performance - Nightly builds + Improve retrieval by finetuning embeddings - ZenML also publishes nightly builds under the zenml-nightly package name. These - are built from the latest develop branch (to which work ready for release is published) - and are not guaranteed to be stable. To install the nightly build, run: + Synthetic data generation - pip install zenml-nightly + Finetuning embeddings with Sentence Transformers - Verifying installations + Evaluating finetuned embeddings - Once the installation is completed, you can check whether the installation was - successful either through Bash: + Finetuning LLMs with ZenML - zenml version + To follow along with the examples and tutorials in this guide, ensure you have + a Python environment set up with ZenML installed. Familiarity with the concepts + covered in the Starter Guide and Production Guide is recommended. - or through Python: + We''ll showcase a specific application over the course of this LLM guide, showing + how you can work from a simple RAG pipeline to a more complex setup that involves + finetuning embeddings, reranking retrieved documents, and even finetuning the + LLM itself. We''ll do this all for a use case relevant to ZenML: a question answering + system that can provide answers to common questions about ZenML. This will help + you understand how to apply the concepts covered in this guide to your own projects.' + - ' data with tags - import zenml + Get arbitrary artifacts in a stepHandle custom data types - print(zenml.__version__) + Load artifacts into memory - If you would like to learn more about the current release, please visit our PyPi - package page. + Datasets in ZenML - Running with Docker' - - "e details of the deployment process from the user.It needs to act as a ZenML\ - \ BaseService registry, where every BaseService instance is used as an internal\ - \ representation of a remote model server (see the find_model_server abstract\ - \ method). To achieve this, it must be able to re-create the configuration of\ - \ a BaseService from information that is persisted externally, alongside, or even\ - \ as part of the remote model server configuration itself. For example, for model\ - \ servers that are implemented as Kubernetes resources, the BaseService instances\ - \ can be serialized and saved as Kubernetes resource annotations. This allows\ - \ the model deployer to keep track of all externally running model servers and\ - \ to re-create their corresponding BaseService instance representations at any\ - \ given time. The model deployer also defines methods that implement basic life-cycle\ - \ management on remote model servers outside the coverage of a pipeline (see stop_model_server\ - \ , start_model_server and delete_model_server).\n\nPutting all these considerations\ - \ together, we end up with the following interface:\n\nfrom abc import ABC, abstractmethod\n\ - from typing import Dict, List, Optional, Type\nfrom uuid import UUID\n\nfrom zenml.enums\ - \ import StackComponentType\nfrom zenml.services import BaseService, ServiceConfig\n\ - from zenml.stack import StackComponent, StackComponentConfig, Flavor\n\nDEFAULT_DEPLOYMENT_START_STOP_TIMEOUT\ - \ = 300\n\nclass BaseModelDeployerConfig(StackComponentConfig):\n \"\"\"Base\ - \ class for all ZenML model deployer configurations.\"\"\"\n\nclass BaseModelDeployer(StackComponent,\ - \ ABC):\n \"\"\"Base class for all ZenML model deployers.\"\"\"\n\n@abstractmethod\n\ - \ def perform_deploy_model(\n self,\n id: UUID,\n config:\ - \ ServiceConfig,\n timeout: int = DEFAULT_DEPLOYMENT_START_STOP_TIMEOUT,\n\ - \ ) -> BaseService:\n \"\"\"Abstract method to deploy a model.\"\"\"" -- source_sentence: How can I implement the abstract method to deploy a model using - ZenML? - sentences: - - "> \\\n --build_timeout=# Register and set a stack\ - \ with the new image builder\nzenml stack register -i \ - \ ... --set\n\nCaveats\n\nAs described in this Google Cloud Build documentation\ - \ page, Google Cloud Build uses containers to execute the build steps which are\ - \ automatically attached to a network called cloudbuild that provides some Application\ - \ Default Credentials (ADC), that allow the container to be authenticated and\ - \ therefore use other GCP services.\n\nBy default, the GCP Image Builder is executing\ - \ the build command of the ZenML Pipeline Docker image with the option --network=cloudbuild,\ - \ so the ADC provided by the cloudbuild network can also be used in the build.\ - \ This is useful if you want to install a private dependency from a GCP Artifact\ - \ Registry, but you will also need to use a custom base parent image with the\ - \ keyrings.google-artifactregistry-auth installed, so pip can connect and authenticate\ - \ in the private artifact registry to download the dependency.\n\nFROM zenmldocker/zenml:latest\n\ - \nRUN pip install keyrings.google-artifactregistry-auth\n\nThe above Dockerfile\ - \ uses zenmldocker/zenml:latest as a base image, but is recommended to change\ - \ the tag to specify the ZenML version and Python version like 0.33.0-py3.10.\n\ - \nPreviousKaniko Image BuilderNextDevelop a Custom Image Builder\n\nLast updated\ - \ 21 days ago" - - ":\n \"\"\"Abstract method to deploy a model.\"\"\"@staticmethod\n @abstractmethod\n\ - \ def get_model_server_info(\n service: BaseService,\n ) -> Dict[str,\ - \ Optional[str]]:\n \"\"\"Give implementation-specific way to extract relevant\ - \ model server\n properties for the user.\"\"\"\n\n@abstractmethod\n \ - \ def perform_stop_model(\n self,\n service: BaseService,\n \ - \ timeout: int = DEFAULT_DEPLOYMENT_START_STOP_TIMEOUT,\n force: bool\ - \ = False,\n ) -> BaseService:\n \"\"\"Abstract method to stop a model\ - \ server.\"\"\"\n\n@abstractmethod\n def perform_start_model(\n self,\n\ - \ service: BaseService,\n timeout: int = DEFAULT_DEPLOYMENT_START_STOP_TIMEOUT,\n\ - \ ) -> BaseService:\n \"\"\"Abstract method to start a model server.\"\ - \"\"\n\n@abstractmethod\n def perform_delete_model(\n self,\n \ - \ service: BaseService,\n timeout: int = DEFAULT_DEPLOYMENT_START_STOP_TIMEOUT,\n\ - \ force: bool = False,\n ) -> None:\n \"\"\"Abstract method to\ - \ delete a model server.\"\"\"\n\nclass BaseModelDeployerFlavor(Flavor):\n \ - \ \"\"\"Base class for model deployer flavors.\"\"\"\n\n@property\n @abstractmethod\n\ - \ def name(self):\n \"\"\"Returns the name of the flavor.\"\"\"\n\n\ - @property\n def type(self) -> StackComponentType:\n \"\"\"Returns the\ - \ flavor type.\n\nReturns:\n The flavor type.\n \"\"\"\n \ - \ return StackComponentType.MODEL_DEPLOYER\n\n@property\n def config_class(self)\ - \ -> Type[BaseModelDeployerConfig]:\n \"\"\"Returns `BaseModelDeployerConfig`\ - \ config class.\n\nReturns:\n The config class.\n \"\"\"\ - \n return BaseModelDeployerConfig\n\n@property\n @abstractmethod\n \ - \ def implementation_class(self) -> Type[BaseModelDeployer]:\n \"\"\"\ - The class that implements the model deployer.\"\"\"\n\nThis is a slimmed-down\ - \ version of the base implementation which aims to highlight the abstraction layer.\ - \ In order to see the full implementation and get the complete docstrings, please\ - \ check the SDK docs .\n\nBuilding your own model deployers" - - "se you decide to switch to another Data Validator.All you have to do is call\ - \ the whylogs Data Validator methods when you need to interact with whylogs to\ - \ generate data profiles. You may optionally enable whylabs logging to automatically\ - \ upload the returned whylogs profile to WhyLabs, e.g.:\n\nimport pandas as pd\n\ - from whylogs.core import DatasetProfileView\nfrom zenml.integrations.whylogs.data_validators.whylogs_data_validator\ - \ import (\n WhylogsDataValidator,\n)\nfrom zenml.integrations.whylogs.flavors.whylogs_data_validator_flavor\ - \ import (\n WhylogsDataValidatorSettings,\n)\nfrom zenml import step\n\nwhylogs_settings\ - \ = WhylogsDataValidatorSettings(\n enable_whylabs=True, dataset_id=\"\"\ - \n)\n\n@step(\n settings={\n \"data_validator\": whylogs_settings\n\ - \ }\n)\ndef data_profiler(\n dataset: pd.DataFrame,\n) -> DatasetProfileView:\n\ - \ \"\"\"Custom data profiler step with whylogs\n\nArgs:\n dataset: a\ - \ Pandas DataFrame\n\nReturns:\n Whylogs profile generated for the data\n\ - \ \"\"\"\n\n# validation pre-processing (e.g. dataset preparation) can take\ - \ place here\n\ndata_validator = WhylogsDataValidator.get_active_data_validator()\n\ - \ profile = data_validator.data_profiling(\n dataset,\n )\n #\ - \ optionally upload the profile to WhyLabs, if WhyLabs credentials are configured\n\ - \ data_validator.upload_profile_view(profile)\n\n# validation post-processing\ - \ (e.g. interpret results, take actions) can happen here\n\nreturn profile\n\n\ - Have a look at the complete list of methods and parameters available in the WhylogsDataValidator\ - \ API in the SDK docs.\n\nCall whylogs directly\n\nYou can use the whylogs library\ - \ directly in your custom pipeline steps, and only leverage ZenML's capability\ - \ of serializing, versioning and storing the DatasetProfileView objects in its\ - \ Artifact Store. You may optionally enable whylabs logging to automatically upload\ - \ the returned whylogs profile to WhyLabs, e.g.:" -- source_sentence: How can I register and configure a GCP Service Connector for accessing - GCP Cloud Build services in ZenML? + Manage big data + + + Skipping materialization + + + Passing artifacts between pipelines + + + Register Existing Data as a ZenML Artifact + + + πŸ“ŠVisualizing artifacts + + + Default visualizations + + + Creating custom visualizations + + + Displaying visualizations in the dashboard + + + Disabling visualizations + + + πŸͺ†Use the Model Control Plane + + + Registering a Model + + + Deleting a Model + + + Associate a pipeline with a Model + + + Connecting artifacts via a Model + + + Controlling Model versions + + + Load a Model in code + + + Promote a Model + + + Linking model binaries/data to a Model + + + Load artifacts from Model + + + πŸ“ˆTrack metrics and metadata + + + Attach metadata to a model + + + Attach metadata to an artifact + + + Attach metadata to steps + + + Group metadata + + + Special Metadata Types + + + Fetch metadata within steps + + + Fetch metadata during pipeline composition + + + πŸ‘¨β€πŸŽ€Popular integrations + + + Run on AWS + + + Run on GCP + + + Run on Azure + + + Kubeflow + + + Kubernetes + + + MLflow + + + Skypilot + + + πŸ”ŒConnect services (AWS, GCP, Azure, K8s etc) + + + Service Connectors guide + + + Security best practices + + + Docker Service Connector + + + Kubernetes Service Connector + + + AWS Service Connector + + + GCP Service Connector + + + Azure Service Connector + + + HyperAI Service Connector + + + 🐍Configure Python environments + + + Handling dependencies + + + Configure the server environment + + + πŸ”ŒConnect to a server + + + Connect in with your User (interactive) + + + Connect with a Service Account + + + πŸ”Interact with secrets + + + 🐞Debug and solve issues + + + 🀝Contribute to ZenML + + + Implement a custom integration + + + Stack Components + + + πŸ“œOverview + + + πŸ”‹Orchestrators + + + Local Orchestrator + + + Local Docker Orchestrator + + + Kubeflow Orchestrator + + + Kubernetes Orchestrator + + + Google Cloud VertexAI Orchestrator + + + AWS Sagemaker Orchestrator + + + AzureML Orchestrator + + + Databricks Orchestrator + + + Tekton Orchestrator + + + Airflow Orchestrator + + + Skypilot VM Orchestrator + + + HyperAI Orchestrator + + + Lightning AI Orchestrator + + + Develop a custom orchestrator + + + πŸͺArtifact Stores + + + Local Artifact Store + + + Amazon Simple Cloud Storage (S3) + + + Google Cloud Storage (GCS)' + - 'Troubleshoot the deployed server + + + Troubleshooting tips for your ZenML deployment + + + In this document, we will go over some common issues that you might face when + deploying ZenML and how to solve them. + + + Viewing logs + + + Analyzing logs is a great way to debug issues. Depending on whether you have a + Kubernetes (using Helm or zenml deploy) or a Docker deployment, you can view the + logs in different ways. + + + If you are using Kubernetes, you can view the logs of the ZenML server using the + following method: + + + Check all pods that are running your ZenML deployment. + + + kubectl -n get pods + + + If you see that the pods aren''t running, you can use the command below to get + the logs for all pods at once. + + + kubectl -n logs -l app.kubernetes.io/name=zenml + + + Note that the error can either be from the zenml-db-init container that connects + to the MySQL database or from the zenml container that runs the server code. If + the get pods command shows that the pod is failing in the Init state then use + zenml-db-init as the container name, otherwise use zenml. + + + kubectl -n logs -l app.kubernetes.io/name=zenml -c + + + You can also use the --tail flag to limit the number of lines to show or the --follow + flag to follow the logs in real-time. + + + If you are using Docker, you can view the logs of the ZenML server using the following + method: + + + If you used the zenml up --docker CLI command to deploy the Docker ZenML server, + you can check the logs with the command: + + + zenml logs -f + + + If you used the docker run command to manually deploy the Docker ZenML server, + you can check the logs with the command: + + + docker logs zenml -f + + + If you used the docker compose command to manually deploy the Docker ZenML server, + you can check the logs with the command: + + + docker compose -p zenml logs -f + + + Fixing database connection problems' +- source_sentence: How can you disable artifact visualization in ZenML? sentences: - - 'System Architectures + - 'Secret management - Different variations of the ZenML architecture depending on your needs. + Configuring the secrets store. - PreviousZenML ProNextZenML SaaS + PreviousCustom secret storesNextZenML Pro Last updated 21 days ago' - - "quired for your GCP Image Builder by running e.g.:zenml service-connector list-resources\ - \ --resource-type gcp-generic\n\nExample Command Output\n\nThe following 'gcp-generic'\ - \ resources can be accessed by service connectors that you have configured:\n\ - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓\n\ - ┃ CONNECTOR ID β”‚ CONNECTOR NAME β”‚ CONNECTOR TYPE β”‚ RESOURCE\ - \ TYPE β”‚ RESOURCE NAMES ┃\n┠──────────────────────────────────────┼────────────────┼────────────────┼────────────────┼────────────────┨\n\ - ┃ bfdb657d-d808-47e7-9974-9ba6e4919d83 β”‚ gcp-generic β”‚ \U0001F535 gcp \ - \ β”‚ \U0001F535 gcp-generic β”‚ zenml-core ┃\n┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛\n\ - \nAfter having set up or decided on a GCP Service Connector to use to authenticate\ - \ to GCP, you can register the GCP Image Builder as follows:\n\nzenml image-builder\ - \ register \\\n --flavor=gcp \\\n --cloud_builder_image=\ - \ \\\n --network= \\\n --build_timeout=\n\ - \n# Connect the GCP Image Builder to GCP via a GCP Service Connector\nzenml image-builder\ - \ connect -i\n\nA non-interactive version that connects the\ - \ GCP Image Builder to a target GCP Service Connector:\n\nzenml image-builder\ - \ connect --connector \n\nExample Command Output" - - ' your GCP Image Builder to the GCP cloud platform.To set up the GCP Image Builder - to authenticate to GCP and access the GCP Cloud Build services, it is recommended - to leverage the many features provided by the GCP Service Connector such as auto-configuration, - best security practices regarding long-lived credentials and reusing the same - credentials across multiple stack components. + - ' visit our PyPi package page. - If you don''t already have a GCP Service Connector configured in your ZenML deployment, - you can register one using the interactive CLI command. You also have the option - to configure a GCP Service Connector that can be used to access more than just - the GCP Cloud Build service: + Running with Dockerzenml is also available as a Docker image hosted publicly on + DockerHub. Use the following command to get started in a bash environment with + zenml available: - zenml service-connector register --type gcp -i + docker run -it zenmldocker/zenml /bin/bash - A non-interactive CLI example that leverages the Google Cloud CLI configuration - on your local machine to auto-configure a GCP Service Connector for the GCP Cloud - Build service: + If you would like to run the ZenML server with Docker: - zenml service-connector register --type gcp --resource-type gcp-generic - --resource-name --auto-configure + docker run -it -d -p 8080:8080 zenmldocker/zenml-server - Example Command Output + Deploying the server - $ zenml service-connector register gcp-generic --type gcp --resource-type gcp-generic - --auto-configure + Though ZenML can run entirely as a pip package on a local system, complete with + the dashboard. You can do this easily: - Successfully registered service connector `gcp-generic` with access to the following - resources: - ┏━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓ + pip install "zenml[server]" + + zenml up # opens the dashboard locally - ┃ RESOURCE TYPE β”‚ RESOURCE NAMES ┃ - ┠────────────────┼────────────────┨ + However, advanced ZenML features are dependent on a centrally-deployed ZenML server + accessible to other MLOps stack components. You can read more about it here. - ┃ πŸ”΅ gcp-generic β”‚ zenml-core ┃ - ┗━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛ + For the deployment of ZenML, you have the option to either self-host it or register + for a free ZenML Pro account. - Note: Please remember to grant the entity associated with your GCP credentials - permissions to access the Cloud Build API and to run Cloud Builder jobs (e.g. - the Cloud Build Editor IAM role). The GCP Service Connector supports many different - authentication methods with different levels of security and convenience. You - should pick the one that best fits your use case. + PreviousIntroductionNextCore concepts - If you already have one or more GCP Service Connectors configured in your ZenML - deployment, you can check which of them can be used to access generic GCP resources - like the GCP Image Builder required for your GCP Image Builder by running e.g.:' -- source_sentence: How can ZenML be used to finetune LLMs for specific tasks or to - improve their performance and cost? + Last updated 20 days ago' + - "Disabling visualizations\n\nDisabling visualizations.\n\nIf you would like to\ + \ disable artifact visualization altogether, you can set enable_artifact_visualization\ + \ at either pipeline or step level:\n\n@step(enable_artifact_visualization=False)\n\ + def my_step():\n ...\n\n@pipeline(enable_artifact_visualization=False)\ndef\ + \ my_pipeline():\n ...\n\nPreviousDisplaying visualizations in the dashboardNextUse\ + \ the Model Control Plane\n\nLast updated 21 days ago" +- source_sentence: How can I programmatically manage secrets using the ZenML Client + API, and what are some of the methods available for tasks like fetching, updating, + and deleting secrets? sentences: - - " build to finish. More information: Build Timeout.We can register the image builder\ - \ and use it in our active stack:\n\nzenml image-builder register \ - \ \\\n --flavor=gcp \\\n --cloud_builder_image= \\\n\ - \ --network= \\\n --build_timeout=\n\ - \n# Register and activate a stack with the new image builder\nzenml stack register\ - \ -i ... --set\n\nYou also need to set up authentication\ - \ required to access the Cloud Build GCP services.\n\nAuthentication Methods\n\ - \nIntegrating and using a GCP Image Builder in your pipelines is not possible\ - \ without employing some form of authentication. If you're looking for a quick\ - \ way to get started locally, you can use the Local Authentication method. However,\ - \ the recommended way to authenticate to the GCP cloud platform is through a GCP\ - \ Service Connector. This is particularly useful if you are configuring ZenML\ - \ stacks that combine the GCP Image Builder with other remote stack components\ - \ also running in GCP.\n\nThis method uses the implicit GCP authentication available\ - \ in the environment where the ZenML code is running. On your local machine, this\ - \ is the quickest way to configure a GCP Image Builder. You don't need to supply\ - \ credentials explicitly when you register the GCP Image Builder, as it leverages\ - \ the local credentials and configuration that the Google Cloud CLI stores on\ - \ your local machine. However, you will need to install and set up the Google\ - \ Cloud CLI on your machine as a prerequisite, as covered in the Google Cloud\ - \ documentation , before you register the GCP Image Builder.\n\nStacks using the\ - \ GCP Image Builder set up with local authentication are not portable across environments.\ - \ To make ZenML pipelines fully portable, it is recommended to use a GCP Service\ - \ Connector to authenticate your GCP Image Builder to the GCP cloud platform." - - 'Finetuning LLMs with ZenML - - - Finetune LLMs for specific tasks or to improve performance and cost. - - - PreviousEvaluating finetuned embeddingsNextSet up a project repository - - - Last updated 6 months ago' - - "Spark\n\nExecuting individual steps on Spark\n\nThe spark integration brings\ - \ two different step operators:\n\nStep Operator: The SparkStepOperator serves\ - \ as the base class for all the Spark-related step operators.\n\nStep Operator:\ - \ The KubernetesSparkStepOperator is responsible for launching ZenML steps as\ - \ Spark applications with Kubernetes as a cluster manager.\n\nStep Operators:\ - \ SparkStepOperator\n\nA summarized version of the implementation can be summarized\ - \ in two parts. First, the configuration:\n\nfrom typing import Optional, Dict,\ - \ Any\nfrom zenml.step_operators import BaseStepOperatorConfig\n\nclass SparkStepOperatorConfig(BaseStepOperatorConfig):\n\ - \ \"\"\"Spark step operator config.\n\nAttributes:\n master: is the\ - \ master URL for the cluster. You might see different\n schemes for\ - \ different cluster managers which are supported by Spark\n like Mesos,\ - \ YARN, or Kubernetes. Within the context of this PR,\n the implementation\ - \ supports Kubernetes as a cluster manager.\n deploy_mode: can either be\ - \ 'cluster' (default) or 'client' and it\n decides where the driver\ - \ node of the application will run.\n submit_kwargs: is the JSON string\ - \ of a dict, which will be used\n to define additional params if required\ - \ (Spark has quite a\n lot of different parameters, so including them,\ - \ all in the step\n operator was not implemented).\n \"\"\"\n\n\ - master: str\n deploy_mode: str = \"cluster\"\n submit_kwargs: Optional[Dict[str,\ - \ Any]] = None\n\nand then the implementation:\n\nfrom typing import List\nfrom\ - \ pyspark.conf import SparkConf\n\nfrom zenml.step_operators import BaseStepOperator\n\ - \nclass SparkStepOperator(BaseStepOperator):\n \"\"\"Base class for all Spark-related\ - \ step operators.\"\"\"\n\ndef _resource_configuration(\n self,\n \ - \ spark_config: SparkConf,\n resource_configuration: \"ResourceSettings\"\ - ,\n ) -> None:\n \"\"\"Configures Spark to handle the resource configuration.\"\ - \"\"" + - "tack:\n\nzenml stack register-secrets []The ZenML client API offers\ + \ a programmatic interface to create, e.g.:\n\nfrom zenml.client import Client\n\ + \nclient = Client()\nclient.create_secret(\n name=\"my_secret\",\n values={\n\ + \ \"username\": \"admin\",\n \"password\": \"abc123\"\n }\n)\n\ + \nOther Client methods used for secrets management include get_secret to fetch\ + \ a secret by name or id, update_secret to update an existing secret, list_secrets\ + \ to query the secrets store using a variety of filtering and sorting criteria,\ + \ and delete_secret to delete a secret. The full Client API reference is available\ + \ here.\n\nSet scope for secrets\n\nZenML secrets can be scoped to a user. This\ + \ allows you to create secrets that are only accessible to one user.\n\nBy default,\ + \ all created secrets are scoped to the active user. To create a secret and scope\ + \ it to your active user instead, you can pass the --scope argument to the CLI\ + \ command:\n\nzenml secret create \\\n --scope user \\\n --=\ + \ \\\n --=\n\nScopes also act as individual namespaces. When\ + \ you are referencing a secret by name in your pipelines and stacks, ZenML will\ + \ look for a secret with that name scoped to the active user.\n\nAccessing registered\ + \ secrets\n\nReference secrets in stack component attributes and settings\n\n\ + Some of the components in your stack require you to configure them with sensitive\ + \ information like passwords or tokens, so they can connect to the underlying\ + \ infrastructure. Secret references allow you to configure these components in\ + \ a secure way by not specifying the value directly but instead referencing a\ + \ secret by providing the secret name and key. Referencing a secret for the value\ + \ of any string attribute of your stack components, simply specify the attribute\ + \ using the following syntax: {{.}}\n\nFor example:\n\ + \n# Register a secret called `mlflow_secret` with key-value pairs for the\n# username\ + \ and password to authenticate with the MLflow tracking server" + - 'y to the active stack + + zenml stack update -c Additionally, we''ll need to log in to the container + registry so Docker can pull and push images. This will require your DockerHub + account name and either your password or preferably a personal access token. + + + docker login + + + For more information and a full list of configurable attributes of the dockerhub + container registry, check out the SDK Docs . + + + PreviousDefault Container RegistryNextAmazon Elastic Container Registry (ECR) + + + Last updated 4 months ago' + - 'nect the stack component to the Service Connector:$ zenml step-operator register + --flavor kubernetes + + Running with active stack: ''default'' (repository) + + Successfully registered step operator ``. + + + $ zenml service-connector list-resources --resource-type kubernetes-cluster -e + + The following ''kubernetes-cluster'' resources can be accessed by service connectors + that you have configured: + + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━���━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓ + + ┃ CONNECTOR ID β”‚ CONNECTOR NAME β”‚ CONNECTOR TYPE + β”‚ RESOURCE TYPE β”‚ RESOURCE NAMES ┃ + + ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ + + ┃ e33c9fac-5daa-48b2-87bb-0187d3782cde β”‚ aws-iam-multi-eu β”‚ πŸ”Ά aws β”‚ + πŸŒ€ kubernetes-cluster β”‚ kubeflowmultitenant ┃ + + ┃ β”‚ β”‚ β”‚ β”‚ + zenbox ┃ + + ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ + + ┃ ed528d5a-d6cb-4fc4-bc52-c3d2d01643e5 β”‚ aws-iam-multi-us β”‚ πŸ”Ά aws β”‚ + πŸŒ€ kubernetes-cluster β”‚ zenhacks-cluster ┃ + + ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ + + ┃ 1c54b32a-4889-4417-abbd-42d3ace3d03a β”‚ gcp-sa-multi β”‚ πŸ”΅ gcp β”‚ + πŸŒ€ kubernetes-cluster β”‚ zenml-test-cluster ┃ + + ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛' pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: @@ -504,49 +759,49 @@ model-index: type: dim_384 metrics: - type: cosine_accuracy@1 - value: 1.0 + value: 0.1917808219178082 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 1.0 + value: 0.5095890410958904 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 1.0 + value: 0.6986301369863014 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 1.0 + value: 0.810958904109589 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 1.0 + value: 0.1917808219178082 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.3333333333333333 + value: 0.16986301369863013 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.2 + value: 0.13972602739726026 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.1 + value: 0.08109589041095888 name: Cosine Precision@10 - type: cosine_recall@1 - value: 1.0 + value: 0.1917808219178082 name: Cosine Recall@1 - type: cosine_recall@3 - value: 1.0 + value: 0.5095890410958904 name: Cosine Recall@3 - type: cosine_recall@5 - value: 1.0 + value: 0.6986301369863014 name: Cosine Recall@5 - type: cosine_recall@10 - value: 1.0 + value: 0.810958904109589 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 1.0 + value: 0.490826354124735 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 1.0 + value: 0.38868232224396587 name: Cosine Mrr@10 - type: cosine_map@100 - value: 1.0 + value: 0.3947220516402755 name: Cosine Map@100 - task: type: information-retrieval @@ -556,49 +811,49 @@ model-index: type: dim_256 metrics: - type: cosine_accuracy@1 - value: 1.0 + value: 0.19726027397260273 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 1.0 + value: 0.5150684931506849 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 1.0 + value: 0.6931506849315069 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 1.0 + value: 0.8191780821917808 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 1.0 + value: 0.19726027397260273 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.3333333333333333 + value: 0.17168949771689496 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.2 + value: 0.13863013698630136 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.1 + value: 0.08191780821917807 name: Cosine Precision@10 - type: cosine_recall@1 - value: 1.0 + value: 0.19726027397260273 name: Cosine Recall@1 - type: cosine_recall@3 - value: 1.0 + value: 0.5150684931506849 name: Cosine Recall@3 - type: cosine_recall@5 - value: 1.0 + value: 0.6931506849315069 name: Cosine Recall@5 - type: cosine_recall@10 - value: 1.0 + value: 0.8191780821917808 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 1.0 + value: 0.4973578695114294 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 1.0 + value: 0.3948053924766253 name: Cosine Mrr@10 - type: cosine_map@100 - value: 1.0 + value: 0.39989299235069015 name: Cosine Map@100 - task: type: information-retrieval @@ -608,49 +863,49 @@ model-index: type: dim_128 metrics: - type: cosine_accuracy@1 - value: 1.0 + value: 0.19452054794520549 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 1.0 + value: 0.5013698630136987 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 1.0 + value: 0.673972602739726 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 1.0 + value: 0.7835616438356164 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 1.0 + value: 0.19452054794520549 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.3333333333333333 + value: 0.16712328767123283 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.2 + value: 0.1347945205479452 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.1 + value: 0.07835616438356163 name: Cosine Precision@10 - type: cosine_recall@1 - value: 1.0 + value: 0.19452054794520549 name: Cosine Recall@1 - type: cosine_recall@3 - value: 1.0 + value: 0.5013698630136987 name: Cosine Recall@3 - type: cosine_recall@5 - value: 1.0 + value: 0.673972602739726 name: Cosine Recall@5 - type: cosine_recall@10 - value: 1.0 + value: 0.7835616438356164 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 1.0 + value: 0.47814279525126957 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 1.0 + value: 0.38079147640791483 name: Cosine Mrr@10 - type: cosine_map@100 - value: 1.0 + value: 0.38821398163789955 name: Cosine Map@100 - task: type: information-retrieval @@ -660,49 +915,49 @@ model-index: type: dim_64 metrics: - type: cosine_accuracy@1 - value: 1.0 + value: 0.1780821917808219 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 1.0 + value: 0.4602739726027397 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 1.0 + value: 0.6547945205479452 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 1.0 + value: 0.7753424657534247 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 1.0 + value: 0.1780821917808219 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.3333333333333333 + value: 0.15342465753424658 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.2 + value: 0.13095890410958905 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.1 + value: 0.07753424657534246 name: Cosine Precision@10 - type: cosine_recall@1 - value: 1.0 + value: 0.1780821917808219 name: Cosine Recall@1 - type: cosine_recall@3 - value: 1.0 + value: 0.4602739726027397 name: Cosine Recall@3 - type: cosine_recall@5 - value: 1.0 + value: 0.6547945205479452 name: Cosine Recall@5 - type: cosine_recall@10 - value: 1.0 + value: 0.7753424657534247 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 1.0 + value: 0.4625139710379368 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 1.0 + value: 0.36313002826701446 name: Cosine Mrr@10 - type: cosine_map@100 - value: 1.0 + value: 0.37046958498969434 name: Cosine Map@100 --- @@ -757,9 +1012,9 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m-v1.5") # Run inference sentences = [ - 'How can ZenML be used to finetune LLMs for specific tasks or to improve their performance and cost?', - 'Finetuning LLMs with ZenML\n\nFinetune LLMs for specific tasks or to improve performance and cost.\n\nPreviousEvaluating finetuned embeddingsNextSet up a project repository\n\nLast updated 6 months ago', - 'Spark\n\nExecuting individual steps on Spark\n\nThe spark integration brings two different step operators:\n\nStep Operator: The SparkStepOperator serves as the base class for all the Spark-related step operators.\n\nStep Operator: The KubernetesSparkStepOperator is responsible for launching ZenML steps as Spark applications with Kubernetes as a cluster manager.\n\nStep Operators: SparkStepOperator\n\nA summarized version of the implementation can be summarized in two parts. First, the configuration:\n\nfrom typing import Optional, Dict, Any\nfrom zenml.step_operators import BaseStepOperatorConfig\n\nclass SparkStepOperatorConfig(BaseStepOperatorConfig):\n """Spark step operator config.\n\nAttributes:\n master: is the master URL for the cluster. You might see different\n schemes for different cluster managers which are supported by Spark\n like Mesos, YARN, or Kubernetes. Within the context of this PR,\n the implementation supports Kubernetes as a cluster manager.\n deploy_mode: can either be \'cluster\' (default) or \'client\' and it\n decides where the driver node of the application will run.\n submit_kwargs: is the JSON string of a dict, which will be used\n to define additional params if required (Spark has quite a\n lot of different parameters, so including them, all in the step\n operator was not implemented).\n """\n\nmaster: str\n deploy_mode: str = "cluster"\n submit_kwargs: Optional[Dict[str, Any]] = None\n\nand then the implementation:\n\nfrom typing import List\nfrom pyspark.conf import SparkConf\n\nfrom zenml.step_operators import BaseStepOperator\n\nclass SparkStepOperator(BaseStepOperator):\n """Base class for all Spark-related step operators."""\n\ndef _resource_configuration(\n self,\n spark_config: SparkConf,\n resource_configuration: "ResourceSettings",\n ) -> None:\n """Configures Spark to handle the resource configuration."""', + 'How can I programmatically manage secrets using the ZenML Client API, and what are some of the methods available for tasks like fetching, updating, and deleting secrets?', + 'tack:\n\nzenml stack register-secrets []The ZenML client API offers a programmatic interface to create, e.g.:\n\nfrom zenml.client import Client\n\nclient = Client()\nclient.create_secret(\n name="my_secret",\n values={\n "username": "admin",\n "password": "abc123"\n }\n)\n\nOther Client methods used for secrets management include get_secret to fetch a secret by name or id, update_secret to update an existing secret, list_secrets to query the secrets store using a variety of filtering and sorting criteria, and delete_secret to delete a secret. The full Client API reference is available here.\n\nSet scope for secrets\n\nZenML secrets can be scoped to a user. This allows you to create secrets that are only accessible to one user.\n\nBy default, all created secrets are scoped to the active user. To create a secret and scope it to your active user instead, you can pass the --scope argument to the CLI command:\n\nzenml secret create \\\n --scope user \\\n --= \\\n --=\n\nScopes also act as individual namespaces. When you are referencing a secret by name in your pipelines and stacks, ZenML will look for a secret with that name scoped to the active user.\n\nAccessing registered secrets\n\nReference secrets in stack component attributes and settings\n\nSome of the components in your stack require you to configure them with sensitive information like passwords or tokens, so they can connect to the underlying infrastructure. Secret references allow you to configure these components in a secure way by not specifying the value directly but instead referencing a secret by providing the secret name and key. Referencing a secret for the value of any string attribute of your stack components, simply specify the attribute using the following syntax: {{.}}\n\nFor example:\n\n# Register a secret called `mlflow_secret` with key-value pairs for the\n# username and password to authenticate with the MLflow tracking server', + "nect the stack component to the Service Connector:$ zenml step-operator register --flavor kubernetes\nRunning with active stack: 'default' (repository)\nSuccessfully registered step operator ``.\n\n$ zenml service-connector list-resources --resource-type kubernetes-cluster -e\nThe following 'kubernetes-cluster' resources can be accessed by service connectors that you have configured:\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓\n┃ CONNECTOR ID β”‚ CONNECTOR NAME β”‚ CONNECTOR TYPE β”‚ RESOURCE TYPE β”‚ RESOURCE NAMES ┃\n┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨\n┃ e33c9fac-5daa-48b2-87bb-0187d3782cde β”‚ aws-iam-multi-eu β”‚ πŸ”Ά aws β”‚ πŸŒ€ kubernetes-cluster β”‚ kubeflowmultitenant ┃\n┃ β”‚ β”‚ β”‚ β”‚ zenbox ┃\n┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨\n┃ ed528d5a-d6cb-4fc4-bc52-c3d2d01643e5 β”‚ aws-iam-multi-us β”‚ πŸ”Ά aws β”‚ πŸŒ€ kubernetes-cluster β”‚ zenhacks-cluster ┃\n┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨\n┃ 1c54b32a-4889-4417-abbd-42d3ace3d03a β”‚ gcp-sa-multi β”‚ πŸ”΅ gcp β”‚ πŸŒ€ kubernetes-cluster β”‚ zenml-test-cluster ┃\n┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛", ] embeddings = model.encode(sentences) print(embeddings.shape) @@ -803,89 +1058,89 @@ You can finetune this model on your own dataset. * Dataset: `dim_384` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:--------| -| cosine_accuracy@1 | 1.0 | -| cosine_accuracy@3 | 1.0 | -| cosine_accuracy@5 | 1.0 | -| cosine_accuracy@10 | 1.0 | -| cosine_precision@1 | 1.0 | -| cosine_precision@3 | 0.3333 | -| cosine_precision@5 | 0.2 | -| cosine_precision@10 | 0.1 | -| cosine_recall@1 | 1.0 | -| cosine_recall@3 | 1.0 | -| cosine_recall@5 | 1.0 | -| cosine_recall@10 | 1.0 | -| cosine_ndcg@10 | 1.0 | -| cosine_mrr@10 | 1.0 | -| **cosine_map@100** | **1.0** | +| Metric | Value | +|:--------------------|:-----------| +| cosine_accuracy@1 | 0.1918 | +| cosine_accuracy@3 | 0.5096 | +| cosine_accuracy@5 | 0.6986 | +| cosine_accuracy@10 | 0.811 | +| cosine_precision@1 | 0.1918 | +| cosine_precision@3 | 0.1699 | +| cosine_precision@5 | 0.1397 | +| cosine_precision@10 | 0.0811 | +| cosine_recall@1 | 0.1918 | +| cosine_recall@3 | 0.5096 | +| cosine_recall@5 | 0.6986 | +| cosine_recall@10 | 0.811 | +| cosine_ndcg@10 | 0.4908 | +| cosine_mrr@10 | 0.3887 | +| **cosine_map@100** | **0.3947** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:--------| -| cosine_accuracy@1 | 1.0 | -| cosine_accuracy@3 | 1.0 | -| cosine_accuracy@5 | 1.0 | -| cosine_accuracy@10 | 1.0 | -| cosine_precision@1 | 1.0 | -| cosine_precision@3 | 0.3333 | -| cosine_precision@5 | 0.2 | -| cosine_precision@10 | 0.1 | -| cosine_recall@1 | 1.0 | -| cosine_recall@3 | 1.0 | -| cosine_recall@5 | 1.0 | -| cosine_recall@10 | 1.0 | -| cosine_ndcg@10 | 1.0 | -| cosine_mrr@10 | 1.0 | -| **cosine_map@100** | **1.0** | +| Metric | Value | +|:--------------------|:-----------| +| cosine_accuracy@1 | 0.1973 | +| cosine_accuracy@3 | 0.5151 | +| cosine_accuracy@5 | 0.6932 | +| cosine_accuracy@10 | 0.8192 | +| cosine_precision@1 | 0.1973 | +| cosine_precision@3 | 0.1717 | +| cosine_precision@5 | 0.1386 | +| cosine_precision@10 | 0.0819 | +| cosine_recall@1 | 0.1973 | +| cosine_recall@3 | 0.5151 | +| cosine_recall@5 | 0.6932 | +| cosine_recall@10 | 0.8192 | +| cosine_ndcg@10 | 0.4974 | +| cosine_mrr@10 | 0.3948 | +| **cosine_map@100** | **0.3999** | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:--------| -| cosine_accuracy@1 | 1.0 | -| cosine_accuracy@3 | 1.0 | -| cosine_accuracy@5 | 1.0 | -| cosine_accuracy@10 | 1.0 | -| cosine_precision@1 | 1.0 | -| cosine_precision@3 | 0.3333 | -| cosine_precision@5 | 0.2 | -| cosine_precision@10 | 0.1 | -| cosine_recall@1 | 1.0 | -| cosine_recall@3 | 1.0 | -| cosine_recall@5 | 1.0 | -| cosine_recall@10 | 1.0 | -| cosine_ndcg@10 | 1.0 | -| cosine_mrr@10 | 1.0 | -| **cosine_map@100** | **1.0** | +| Metric | Value | +|:--------------------|:-----------| +| cosine_accuracy@1 | 0.1945 | +| cosine_accuracy@3 | 0.5014 | +| cosine_accuracy@5 | 0.674 | +| cosine_accuracy@10 | 0.7836 | +| cosine_precision@1 | 0.1945 | +| cosine_precision@3 | 0.1671 | +| cosine_precision@5 | 0.1348 | +| cosine_precision@10 | 0.0784 | +| cosine_recall@1 | 0.1945 | +| cosine_recall@3 | 0.5014 | +| cosine_recall@5 | 0.674 | +| cosine_recall@10 | 0.7836 | +| cosine_ndcg@10 | 0.4781 | +| cosine_mrr@10 | 0.3808 | +| **cosine_map@100** | **0.3882** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:--------| -| cosine_accuracy@1 | 1.0 | -| cosine_accuracy@3 | 1.0 | -| cosine_accuracy@5 | 1.0 | -| cosine_accuracy@10 | 1.0 | -| cosine_precision@1 | 1.0 | -| cosine_precision@3 | 0.3333 | -| cosine_precision@5 | 0.2 | -| cosine_precision@10 | 0.1 | -| cosine_recall@1 | 1.0 | -| cosine_recall@3 | 1.0 | -| cosine_recall@5 | 1.0 | -| cosine_recall@10 | 1.0 | -| cosine_ndcg@10 | 1.0 | -| cosine_mrr@10 | 1.0 | -| **cosine_map@100** | **1.0** | +| Metric | Value | +|:--------------------|:-----------| +| cosine_accuracy@1 | 0.1781 | +| cosine_accuracy@3 | 0.4603 | +| cosine_accuracy@5 | 0.6548 | +| cosine_accuracy@10 | 0.7753 | +| cosine_precision@1 | 0.1781 | +| cosine_precision@3 | 0.1534 | +| cosine_precision@5 | 0.131 | +| cosine_precision@10 | 0.0775 | +| cosine_recall@1 | 0.1781 | +| cosine_recall@3 | 0.4603 | +| cosine_recall@5 | 0.6548 | +| cosine_recall@10 | 0.7753 | +| cosine_ndcg@10 | 0.4625 | +| cosine_mrr@10 | 0.3631 | +| **cosine_map@100** | **0.3705** |