You can view the usage status of all AI EasyMaker resources in the dashboard.
Displays the number of resources in use per resource.
Create and manage Jupyter notebook with essential packages installed for machine learning development.
Create a Jupyter notebook.
Image: Select OS image to be installed on the notebook instance.
Notebook Information
Storage
/root/easymaker directory path. Data on this storage is retained even when the notebook is restarted.nas://{NAS ID}:/{path}.Note
Notebooks can take several minutes to create. Creation of the initial resources (notebooks, training, experiments, endpoint) takes additional few minutes to configure the service environment.
Caution
Only NHN Cloud NAS created on the same project as AI EasyMaker is available to use.
A list of notebooks are displayed. Select a notebook in the list to check details and make changes to it.
Status: Status of the notebook is displayed. Please refer to the table below for the main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Notebook creation is requested. |
| CREATE IN PROGRESS | Notebook instance is in the process of creation. |
| ACTIVE (HEALTHY) | Notebook application is in normal operation. |
| ACTIVE (UNHEALTHY) | Notebook application is not operating properly. If this condition persists after restarting the notebook, please contact Customer Support. |
| STOP IN PROGRESS | Notebook stop in progress. |
| STOPPED | Notebook stopped. |
| START IN PROGRESS | Notebook start in progress |
| REBOOT IN PROGRESS | Notebook reboot in progress. |
| DELETE IN PROGRESS | Notebook delete in progress. |
| CREATE FAILED | Failed to crate notebook. If keep fails to create, please contact Customer Support. |
| STOP FAILED | Failed to stop notebook. Please try to stop again. |
| START FAILED | Failed to start notebook. Please try to start again. |
| REBOOT FAILED | Failed to reboot notebook. Please try to start again. |
| DELETE FAILED | Failed to delete notebook. Please try to delete again. |
Action > Open Jupyter Notebook: Click Open Jupyter Notebook button to open the notebook in a new browser window. The notebook is only accessible to users who are logged in to the console.
Monitoring: On the Monitoring tab of the detail screen that appears when you select the notebook, you can see a list of monitored instances and a chart of basic metrics.
AI EasyMaker notebook instance provides native Conda virtual environment with various libraries and kernels required for machine learning.
Default Conda virtual environment is initialized and driven when the laptop is stopped and started, but the virtual environment and external libraries that the user installs in any path are not automatically initialized and are not retained when the laptop is stopped and started.
To resolve this issue, you must create a virtual environment in directory path /root/easymaker/custom-conda-envs and install an external library in the created virtual environment.
AI EasyMaker notebook instance allows the virtual environment created in the /root/easymaker/custom-conda-envs directory path to initialize and drive when the notebook is stopped and started.
Please refer to the following guide to configure your virtual environment.
Go to /root/easymaker/custom-conda-envs path.
cd /root/easymaker/custom-conda-envs
To create virtual environment called easymaker_env in python 3.8 version, run the command conda create as follows
conda create --prefix ./easymaker_env python=3.8
Created virtual environment can be checked with conda env list command.
(base) root@nb-xxxxxx-0:~# conda env list
# conda environments:
#
/opt/intel/oneapi/intelpython/latest
/opt/intel/oneapi/intelpython/latest/envs/2022.2.1
base * /opt/miniconda3
easymaker_env /root/easymaker/custom-conda-envs/easymaker_env
You can register scripts in the path /root/easymaker/cont-init.d that should run automatically when the notebook is stopped and started.
The scripts are executed in ascending alphanumeric order.
/root/easymaker/cont-init.d are executed.#!./root/easymaker/cont-init.d/{SCRIPT}.exitcode/root/easymaker/cont-init.d/{SCRIPT}.output/root/easymaker/cont-init.outputStop the running notebook or start the stopped notebook.
Note
It may take several minutes to start and stop notebooks.
Caution
When stopping and starting the notebook, the virtual environment and external libraries that the user create can be initialized. In order to retain, configure your virtual environment by referring to User Virtual Execution Environment Configuration.
Change the instance type of the created notebook. Instance type you want to change can only be changed to the same core type instance type as the existing instance.
Note
It may take several minutes to change the instance type.
If a problem occurs while using the notebook, or if the status is ACTIVE but you can't access the notebook, you can reboot the notebook.
Caution
When rebooting the notebook, the virtual environment and external libraries that the user create can be initialized. In order to retain, configure your virtual environment by referring to User Virtual Execution Environment Configuration.
Delete the created notebook.
Note
When deleting a notebook, boot storage and data storage are to be deleted. Connected NHN Cloud NAS is not deleted and must be deleted individually from NHN Cloud NAS.
Experiments are managed by grouping related trainings into experiments.
Note
Creating experiments can take several minutes. When creating the initial resources (laptops, trainings, labs, endpoints), it takes an extra few minutes to configure the service environment.
Experiments appears. Select an experiment to view and modify detailed information.
Status: Experiment status appears. Please refer to the table below for main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Creating an experiment is requested. |
| CREATE IN PROGRESS | An experiment is being created. |
| CREATE FAILED | Failed to create an experiment. Please try again. |
| ACTIVE | The experiment is successfully created. |
Operation
Delete an experiment.
Note
You cannot delete an experiment if a pipeline schedule associated with the experiment exists, or if there are training, hyperparameter tuning, or pipeline execution in production. Delete the resources associated with the experiment first, then delete the experiment. For associated resources, you can check the list by clicking the [Training] tab in the detail screen at the bottom that is displayed when you click the experiment you want to delete.
Provides an training environment where you can learn and identify machine training algorithms based on training results.
Set the training environment by selecting the instance and OS image to be trained, and proceed with training by entering the algorithm information and input/output data path to learn.
Algorithm information : Enter information about the algorithm you want to learn.
Own Algorithm : Uses an algorithm written by the user.
algorithm path
entry point
Image : Choose an image for your instance that matches the environment in which you need to run your training.
Training Resource Information
Caution
A list of studies is displayed. If you select a training from the list, you can check detailed information and change the information.
Status : Shows the status of training. Please refer to the table below for the main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | You have requested to create a training. |
| CREATE IN PROGRESS | This is a state in which resources necessary for training are being created. |
| RUNNING | Training is in progress. |
| STOPPED | Training is stopped at the user's request. |
| COMPLETE | Training has been completed normally. |
| STOP IN PROGRESS | Training is stopping. |
| FAIL TRAIN | This is a failed state during training. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
| CREATE FAILED | The training creation failed. If creation continues to fail, please contact Customer Support. |
| FAIL TRAIN IN PROGRESS, COMPLETE IN PROGRESS | The resources used for training are being cleaned up. |
Operation
Hyperparameters : You can check the hyperparameter values set for training on the hyperparameter tab of the detailed screen displayed when selecting training.
Monitoring: When you select the endpoint stage, you can see a list of monitored instances and basic metrics charts in the Monitoring tab of the detailed screen that appears.
Create a new training with the same settings as an existing training.
Create a model with training in the completed state.
Deletes a training.
Note
Training cannot be deleted if a model created by the training to be deleted exists. Please delete the model first and then the training.
Hyperparameter tuning is the process of optimizing hyperparameter values to maximize a model's predictive accuracy. If you don't use this feature, you'll have to manually tune the hyperparameters to find the optimal values while running many training jobs yourself.
How to configure a hyperparameter tuning job.
Caution
A list of hyperparameter tunings is displayed. Select a hyperparameter tuning from the list to view details and change information.
Status : Shows the status of hyperparameter tuning. Please refer to the table below for the main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Requested to create hyperparameter tuning. |
| CREATE IN PROGRESS | Resources required for hyperparameter tuning are being created. |
| RUNNING | Hyperparameter tuning is in progress. |
| STOPPED | Hyperparameter tuning is stopped at the user's request. |
| COMPLETE | Hyperparameter tuning has been successfully completed. |
| STOP IN PROGRESS | Hyperparameter tuning is stopping. |
| FAIL HYPERPARAMETER TUNING | A failed state during hyperparameter tuning in progress. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
| CREATE FAILED | Hyperparameter tuning generation failed. If creation continues to fail, please contact Customer Support. |
| FAIL HYPERPARAMETER TUNING IN PROGRESS, COMPLETE IN PROGRESS, STOP IN PROGRESS | Resources used for hyperparameter tuning are being cleaned up. |
Status Details: The bracketed content in the COMPLETE status is the status details. See the table below for key details.
| Details | Description |
|---|---|
| GoalReached | Details when training for hyperparameter tuning is complete by reaching the target value. |
| MaxTrialsReached | Details when hyperparameter tuning has reached the maximum number of training runs and is complete. |
| SuggestionEndReached | Details when the exploration algorithm in Hyperparameter Tuning has explored all hyperparameters. |
Operation
Monitoring: When you select hyperparameter tuning, you can check the list of monitored instances and basic indicator charts in the Monitoring tab of the detailed screen that appears.
Displays a list of trainings auto-generated by hyperparameter tuning. Select a training from the list to check detailed information.
Status : Shows the status of the training automatically generated by hyperparameter tuning. Please refer to the table below for the main status.
| Status | Description |
|---|---|
| CREATED | Training has been created. |
| RUNNING | Training is in progress. |
| SUCCEEDED | Training has been completed normally. |
| KILLED | Training is stopped by the system. |
| FAILED | This is a failed state during training. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
| METRICS_UNAVAILABLE | This is a state where target metrics cannot be collected. |
| EARLY_STOPPED | Performance (goal metric) is not getting better while training is in progress, so it is in an early-stopped state. |
Create a new hyperparameter tuning with the same settings as the existing hyperparameter tuning.
Create a model with the best training of hyperparameter tuning in the completed state.
Delete a hyperparameter tuning.
Note
Hyperparameter tuning cannot be deleted if the model created by the hyperparameter tuning you want to delete exists. Please delete the model first, then the hyperparameter tuning.
By creating a training template in advance, you can import the values entered into the template when creating training or hyperparameter tuning.
For information on what you can set in your training template, see Creating a training.
Displays a list of training templates. Select a training template from the list to view details and change information.
Create a new training template with the same settings as an existing training template.
Delete the training template.
Can manage models of AI EasyMaker's training outcomes or external models as artifacts.
obs://{Object Storage API endpoint}/{containerName}/{path}.nas://{NAS ID}:/{path}.Note
The values entered as model parameters are used when serving the model. Parameters can be used as arguments and environment variables: Arguments are used as the parameter name as entered, and environment variables are used with the parameter name converted to screaming snake notation.
Note
When creating a HuggingFace model, you can create the model by entering the ID of the HuggingFace model as a parameter. The ID of the HuggingFace model can be found in the URL of the HuggingFace model page. For more information, see Appendix > 11. Framework-specific serving notes.
Caution
Only NHN Cloud NAS created on the same project as AI EasyMaker is available to use.
Caution
If not retained the model artifacts stored in storage, the creation of endpoints for that model fails.
Caution
The file type for the HuggingFace model are limited to safetensors. Safetensors is a safe and efficient machine learning model developed by HuggingFace. Other file types are not supported.
Caution
The model artifact path you enter must contain the model file and the config.pbtxt file in a structure that allows the model to be run with Triton.
See the example below:
Example
model_name/
├── config.pbtxt # Model selection file
└── 1/ # Version 1 directory
└── model.savedmodel/ # TensorFlow SavedModel directory
├── saved_model.pb # Metagraph and checkpoint data
└── variables/ # Model weight directory
├── variables.data-00000-of-00001
└── variables.index
Model list is displayed. Selecting a model in the list allows to check detailed information and make changes to it.
Status: Model's status is displayed. For major statuses, see the following table.
| Status | Description |
|---|---|
| CREATE REQUESTED | Model creation is requested. |
| CREATE IN PROGRESS | Resource required for the model is being created. |
| DELETE IN PROGRESS | Model is being deleted. |
| ACTIVE | Model is created successfully. |
| CREATE FAILED | Failed to created a model. If creation fails repeatedly, contact Customer Support. |
| DELETE FAILED | Failed to delete a model. Please try again. |
Training Name: For models created from training, training name that is based is displayed.
Create an endpoint that can serve the selected model.
Create batch inferences with the selected model and view the inference results as statistics.
Delete a model.
Note
You cannot delete model if endpoint created by model want to delete is existed. To delete, delete the endpoint created by the model first and then delete the model.
Measure the performance of models, and compare performance across different models.
Batch inferences are automatically created during the model evaluation process.
Caution
A list of model evaluations is displayed. Select a model evaluation in the list to view details and make changes to the information.
Status: Displays the status of the model evaluation. See the table below for the main statuses.
| Status | Description |
|---|---|
| CREATE REQUESTED | Model evaluation creation is requested. |
| CREATE IN PROGRESS | Model evaluation is being created. |
| CREATE FAILED | Model evaluation creation has failed. Please try again. |
| RUNNING | Model evaluation is in progress. |
| COMPLETE IN PROGRESS, FAIL MODEL EVALUATION IN PROGRESS | Resources used for model evaluation are being cleaned up. |
| COMPLETE | Model evaluation completed successfully. |
| STOP IN PROGRESS | Model evaluation is stopping. |
| STOPPED | Model evaluation has been stopped at the user's request. |
| FAIL MODEL EVALUATION | Model evaluation has failed. If log management is enabled, you can check the detailed failure information in the Log & Crash Search logs. |
| DELETE IN PROGRESS | Model evaluation is being deleted. |
Task
Compare evaluation metrics across models.
Delete a model evaluation.
Create and manage endpoints that can serve the model.
/inference, you can request the inference API with POST https://{enpdoint-domain}/inference.Note
The AI EasyMaker service provides endpoints based on the open inference protocol (OIP) specification. For the endpoint API specification, see Appendix > 10. Endpoint API specification. To use a separate endpoint, refer to the resources created in the API Gateway service and create a new resource to use it. For more information about the OIP specification, see OIP specification.
Note
Endpoint creation can take several minutes. Creation of the initial resources takes additional few minutes to configure the service environment.
Note
When you create a new endpoint, create a new API Gateway service. Adding new stage on existing endpoint creates new stage in API Gateway service. If you exceed the default provision in API Gateway Service Resource Provision Policy, you might not be able to create endpoints in AI EasyMaker. In this case, adjust API Gateway service resource quota.
Endpoints list is displayed. Select an endpoint in the list to check details and make changes to the information.
Status: Status of endpoint. Please refer to the table below for main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Endpoint creation is requested. |
| CREATE IN PROGRESS | Endpoint creation is in progress. |
| UPDATE IN PROGRESS | Some of endpoint stages have tasks in progress. You can check the status of task for each stage in the endpoint stage list. |
| DELETE IN PROGRESS | Endpoint deletion is in progress. |
| ACTIVE | Endpoint is in normal operation. |
| CREATE FAILED | Endpoint creation has failed. You must delete and recreate the endpoint. If the creation fails repeatedly, please contact Customer Support. |
| UPDATE FAILED | Some of endpoint stages are not serviced properly. You must delete and recreate the stages with issues. |
API Gateway Status: Displays API Gateway status information for default stage of endpoint. Please refer to the table below for main status.
| Status | Description |
|---|---|
| CREATE IN PROGRESS | API Gateway Resource creation in progress. |
| STAGE DEPLOYING | API Gateway default stage deploying in progress. |
| ACTIVE | API Gateway default stage is successfully deployed and activated. |
| NOT FOUND: STAGE | Default stage for endpoint is not found. Please check if the stage exists in API Gateway console. If stage is deleted, the deleted API Gateway stage cannot be recovered, and the endpoint have to be deleted and recreated. |
| NOT FOUND: STAGE DEPLOY RESULT | The deployment status of the endpoint default stage is not found. Please check if the default stage is deployed in API Gateway console. |
| STAGE DEPLOY FAIL | API Gateway default stage has failed to deploy. [Note] Please refer to Recovery method when the stage's API Gateway in 'Deployment Failure' status and recover from the deployment failed state. |
Add new stage to existing endpoint. You can create and test the new stage without affecting default stage.
Stage list created under endpoint is displayed. Select stage in the list to check more information in the list.
Status: Displays status of endpoint stage. Please refer to the table below for main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Endpoint stage creation requested. |
| CREATE IN PROGRESS | Endpoint stage creation is in progress. |
| DEPLOY IN PROGRESS | Model deployment to the endpoint stage is in progress. |
| DELETE IN PROGRESS | Endpoint stage deletion is in progress. |
| ACTIVE | Endpoint stage is normal operation. |
| CREATE FAILED | Endpoint stage creation has failed. Please try again. |
| DEPLOY FAILED | Deployment to the endpoint stage has failed. Please try again. |
API Gateway Status: Displays stage status of API Gateway from where endpoint stage is deployed.
Caution
When creating an endpoint or an endpoint stage, AI EasyMaker creates API Gateway services and stages for the endpoint. Please note the following precautions when changing API Gateway services and stages created by AI EasyMaker directly from API Gateway service console.
Note
If stage settings of AI EasyMaker endpoint are not deployed to the API Gateway stage due to a temporary issue, deployment status is displayed as failed. In this case, you can deploy API Gateway stage manually by clicking Select Stage from the Stage list > View API Gateway Settings > 'Deploy Stage' in the bottom detail screen. If this guide couldn’t recover the deployment status, please contact the Customer Center.
Add a new resource to an existing endpoint stage.
Model: Select the model you want to deploy to your endpoints. If you have not created a model, please create one first.
Resource quota(%): Enter the resources you want to allocate to the model. Allocate a fixed percentage of the instance's resource room usage.
Number of Pods: Enter a number of pods in the stage resource.
Description: Enter a description for the stage resource.
Pod Auto Scaler: The feature to automatically adjust the number of Pods based on the request volume of your model. The autoscaler is set on a per-model basis.
A list of resources created under the endpoint stage is displayed.
Status : Shows the status of stage resource. Please refer to the table below for the main status.
| Status | Description |
|---|---|
| CREATE REQUESTED | Creating stage resource requested. |
| CREATE IN PROGRESS | Stage resource is being created. |
| Training is properly completed. | Stage resource is being deleted. |
| ACTIVE | Stage resource is deployed normally. |
| CREATE FAILED | Creating stage resource failed. Please try again. |
Model Name: The name of the model deployed to the stage.
// Inference API example: Request
curl --location --request POST '{API Gateway Resource Path}' \
--header 'Content-Type: application/json' \
--data-raw '{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}'
// Inference API Example: Response
{
"predictions" : [
[
0.337502569,
0.332836747,
0.329660654
],
[
0.337530434,
0.332806051,
0.329663515
]
]
}
Change the default stage of the endpoint to another stage. To change the model of an endpoint without service stop, AI EasyMaker recommends deploying the model using stage capabilities.
Caution
Deleting an endpoint stage in AI EasyMaker also deletes the stage in API Gateway service from which the endpoint's stage is deployed. If there is an API running on the API Gateway stage to be deleted, please be noted that API calls cannot be made.
Delete an endpoint.
Caution
Deleting an endpoint stage in AI EasyMaker also deletes API Gateway service from which the endpoint's stage was deployed. If there is API running on the API Gateway service to be deleted, please be noted that API calls cannot be made.
Provides an environment to make batch inferences from an AI EasyMaker model and view inference results in statistics.
Set up the environment in which batch inference will be performed by selecting an instance and OS image, and enter the paths to the input/output data to be inferred to proceed with batch inference.
Note
Caution
Caution
Batch inference using GPU instances allocates GPU instances based on the number of Pods
If Number of Pods / Number of GPUs is not divisible by an integer, you may encounter unallocated GPUs
Unallocated GPUs are not used by batch inference, so set the number of Pods appropriately to use GPU instances efficiently.
Displays a list of batch inferences. Select a batch inference from the list to check the details and change the information.
Status : Displays the status of batch inference. Please refer to the table below for the main status.
| Failed Training : Indicates the number of failed lessons. | Best Training: Indicates the target metric information of the training that recorded the highest target metric value among the training automatically generated by hyperparameter tuning. |
|---|---|
| Status : Shows the status of hyperparameter tuning. Please refer to the table below for the main status. | You have requested to create a batch inference. |
| API Gateway Status: Displays API Gateway status information for default stage of endpoint. Please refer to the table below for main status. | This is a state in which resources necessary for batch inference are being created. |
| Description | Batch inference is in progress. |
| Resources required for hyperparameter tuning are being created. | Batch inference is stopped at the user's request. |
| COMPLETE | Batch inference has been completed successfully. |
| STOP IN PROGRESS | Batch inference is stopping. |
| FAIL BATCH INFERENCE | This is a failed state during batch inference. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
| CREATE FAILED | The batch inference creation failed. If creation continues to fail, please contact Customer Support. |
| FAIL BATCH INFERENCE IN PROGRESS, COMPLETE IN PROGRESS | The resources used for batch inference are being cleaned up. |
Operation
Create a new batch inference with the same settings as an existing batch inference.
Delete a batch inference.
User-personalized container images can be used to drive notebooks, training, and hyperparameter tuning. Only private images derived from the notebook/deep learning images provided by AI EasyMaker can be used when creating resources in AI EasyMaker. See the table below for the base images in AI EasyMaker.
| Image Name | CoreType | Framework | Framework version | Python version | Image address |
|---|---|---|---|---|---|
| Ubuntu 22.04 CPU Python Notebook | CPU | Python | 3.10.12 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-cpu-py310-ubuntu2204 |
| Ubuntu 22.04 GPU Python Notebook | GPU | Python | 3.10.12 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-gpu-py310-ubuntu2204 |
| Ubuntu 22.04 CPU PyTorch Notebook | CPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-notebook:2.0.1-cpu-py310-ubuntu2204 |
| Ubuntu 22.04 GPU PyTorch Notebook | GPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-notebook:2.0.1-gpu-py310-ubuntu2204 |
| Ubuntu 22.04 CPU TensorFlow Notebook | CPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-notebook:2.12.0-cpu-py310-ubuntu2204 |
| Ubuntu 22.04 GPU TensorFlow Notebook | GPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-notebook:2.12.0-gpu-py310-ubuntu2204 |
| Image Name | CoreType | Framework | Framework version | Python version | Image address |
|---|---|---|---|---|---|
| Ubuntu 22.04 CPU PyTorch Training | CPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-train:2.0.1-cpu-py310-ubuntu2204 |
| Ubuntu 22.04 GPU PyTorch Training | GPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-train:2.0.1-gpu-py310-ubuntu2204 |
| Ubuntu 22.04 CPU TensorFlow Training | CPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-train:2.12.0-cpu-py310-ubuntu2204 |
| Ubuntu 22.04 GPU TensorFlow Training | GPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-train:2.12.0-gpu-py310-ubuntu2204 |
Note
Only NHN Container Registry (NCR) can be integrated as a container registry service where private images are stored. (As of December 2023)
Caution
Only private images derived from base images provided by AI EasyMaker can be used.
The following document explains how to create a container image with an AI EasyMaker-based image using Docker, and using a private image for notebooks in AI EasyMaker.
Create a DockerFile of private image.
FROM fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-cpu-py310-ubuntu2204 as easymaker-notebook
RUN conda create -n example python=3.10
RUN conda activate example
RUN pip install torch torchvision
Build a private image and push to the container registry Build an image with a Dockerfile and save (push) the image to the NCR registry.
docker build -t {image name}:{tags} . .
docker tag {image name}:{tag} docker push {NCR registry address}/{image name}:{tag}
docker push {NCR registry address}/{image name}:{tag} .
(Example)
docker build -t custom-training:v1 .
docker tag custom-training:v1 example-kr1-registry.container.nhncloud.com/registry/custom-training:v1
docker push example-kr1-registry.container.nhncloud.com/registry/custom-training:v1
Create a private image in AI EasyMaker of the image you saved (pushed) to the NCR.
Create a notebook with the private image you created.
Note
Private images can be used for notebooks, training, and hyperparameter tuning to create resources.
Note
Only NCR service can be used as a container registry service. (As of December 2023) Enter the following values for the account ID and password for the NCR service. ID: User Access Key of NHN Cloud user account Password: User Secret Key of NHN Cloud user account
In order for AI EasyMaker to pull an image from a user's registry where private images are stored to power the container, they need to be logged into the user's registry. If you save your login information with a registry account, you can reuse it in images linked to that registry account. To manage your registry accounts, go to the Image menu in the AI EasyMaker console, then select the Registry Account tab.
Create a new registry account.
Note
When you change your registry account, you sign in to the registry service with the changed username and password when using images associated with that account. If you enter an incorrect registry username and password, the login during a private image pull fails and the resource creation fails.
Caution
If there are resources being created with a private image that has a registry account associated with it, or if there are studies and hyperparameters in progress, you cannot modify them.
Select the registry account you want to delete from the list, and click Delete Registry Account.
Note
You cannot delete a registry account associated with an image. To delete, delete the associated image first and then delete the registry account.
ML Pipeline is a feature for managing and executing portable and scalable machine learning workflows. You can use the Kubeflow Pipelines (KFP) Python SDK to write components and pipelines, compile pipelines into intermediate representation YAML, and run them in AI EasyMaker. Most pipelines are designed to produce one or more ML artifacts, such as datasets, models, and evaluation metrics.
Note
A pipeline is a definition of a workflow that combines one or more components to form a directed acyclic graph (DAG). - Each component runs a single container during execution, which can generate ML artifacts. - Components can take inputs and produce outputs. There are two types of I/O types. Parameters and artifacts: - Parameters are useful for passing small amounts of data between components. - Artifact types are for ML artifact outputs, such as datasets, models, metrics, etc. Provides a convenient mechanism for saving to object storage.
Note
The feature to view console output generated while executing a pipeline is not provided. To check the logs of pipeline code, use the [SDK's Log Send feature] (./sdk-guide/#feature.lncs.log.send) to send the logs to Log & Crash Search.
Note
Kubeflow Pipelines (KFP) official documentation - KFP User Guide - KFP SDK Reference
Upload a pipeline.
Note
Uploading a pipeline can take a few minutes. The initial resource creation requires an additional few minutes of time to configure the service environment.
A list of pipelines is displayed. Select a pipeline in the list to view details and make changes to the information.
Status: The status of the pipeline is displayed. See the table below for key statuses.
| Status | Description |
|---|---|
| CREATE REQUESTED | Pipeline creation has been requested. |
| CREATE IN PROGRESS | Pipeline creation is in progress. |
| CREATE FAILED | Pipeline creation failed. Try again. |
| ACTIVE | The pipeline was created successfully. |
A pipeline graph is displayed. Select a node in the graph to see more information.
A graph is a pictorial representation of a pipeline. Each node in the graph represents a step in the pipeline, with arrows indicating the parent/child relationship between the pipeline components represented by each step.
Delete the pipeline.
Note
You cannot delete a pipeline if a schedule created with the pipeline you want to delete exists. Delete the pipeline schedule first, then delete the pipeline.
You can run and manage your uploaded pipelines in AI EasyMaker.
Run the pipeline.
nas://{NAS ID}:/{path}.Note
Creating a pipeline run can take a few minutes. The initial resource creation requires an additional few minutes of time to configure the service environment.
Caution
Only NHN Cloud NAS created in the same project as AI EasyMaker is available.
A list of pipeline runs is displayed. Select a pipeline run in the list to view details and make changes to the information.
Status: The status of the pipeline execution is displayed. See the table below for key statuses.
| Status | Description |
|---|---|
| CREATE REQUESTED | Pipeline execution creation is requested. |
| CREATE IN PROGRESS | Pipeline run creation is in progress. |
| CREATE FAILED | Pipeline execution creation failed. Try again. |
| RUNNING | Pipeline execution is in progress. |
| COMPLETE IN PROGRESS | The resources used to run the pipeline are being cleaned up. |
| COMPLETE | The pipeline execution has completed successfully. |
| Hyperparameter tuning is stopped at the user's request. | The pipeline is stopping running. |
| STOPPED | The pipeline execution has been stopped at the user's request. |
| FAIL PIPELINE RUN IN PROGRESS | The resources used to run the pipeline are being cleaned up. |
| FAIL PIPELINE RUN | The pipeline execution has failed. Detailed failure information can be found in the Log & Crash Search log when log management is enabled. |
Operation
A graph of the pipeline run is displayed. Select a node in the graph to see more information.
The graph is a pictorial representation of the pipeline execution. This graph shows the steps that have already been executed and the steps that are currently executing during pipeline execution, with arrows indicating the parent/child relationship between the pipeline components represented by each step. Each node in the graph represents a step in the pipeline.
With node-specific details, you can download the generated artifacts.
Caution
Artifacts older than 120 days are automatically deleted.
Stop running pipelines in progress.
Note
Stopping pipeline execution can take a few minutes.
Create a new pipeline run with the same settings as an existing pipeline run.
Delete a pipeline run.
You can create and manage a recurring run to periodically run the uploaded pipeline repeatedly in AI EasyMaker.
Create a recurring run to run the pipeline in periodic iterations.
For information beyond the items below that you can set in creating a pipeline schedule, see Create Recurring Run.
Note
Creating a recurring run can take a few minutes. The initial resource creation requires an additional few minutes of time to configure the service environment.
Note
The Cron expression uses six space-separated fields to represent the time. For more information, see the Cron Expression Format documentation.
A list of pipeline schedules is displayed. Select a pipeline recurring run in the list to view details and make changes to the information.
Status: The status of the pipeline recurring run is displayed. See the table below for key statuses.
| Status | Description |
|---|---|
| CREATE REQUESTED | Pipeline recurring run creation has been requested. |
| CREATE FAILED | Pipeline recurring run creation failed. Try again. |
| ENABLED | The pipeline recurring run has started normally. |
| ENABLED(EXPIRED) | The pipeline recurring run started successfully but has passed the end time you set. |
| DISABLED | The pipeline recurring run has been stopped at the user's request. |
Manage Execution: When you select a pipeline recurring run in the list, you can view the list of runs generated by the pipeline recurring run on the Manage Run tab of the detail screen that appears.
Stop a started pipeline recurring run or start a stopped pipeline recurring run.
Create a new pipeline recurring run with the same settings as an existing pipeline recurring run.
Delete a pipeline recurring run.
Note
You cannot delete a run generated by the pipeline schedule you want to delete if it is in progress. Delete the pipeline schedule after the pipeline run is complete.
Retrieval-Augmented Generation (RAG) is a technology that vectorizes and stores users' documents, retrieves content related to the question, and improves the accuracy of Large Language Model (LLM) responses. AI EasyMaker allows you to integrate vector store, embedding model, and LLM to create and manage RAG systems.
Create a new RAG.
Note
There may be limitations on the format, size, and number of files available for ingestion. For more information, see Collect Sync.
Caution
For how to create an instance, refer to PostgreSQL Instance User Guide.
Caution
Only NHN Cloud NAS created in the same project as AI EasyMaker can be used.
View and manage the list of generated RAGs. Select a RAG from the list to view detailed information.
| Status | Description |
|---|---|
| CREATE REQUESTED | RAG creation has been requested. |
| CREATE IN PROGRESS | RAG creation is in progress. |
| ACTIVE | RAG is operating normally. |
| UPDATE IN PROGRESS | RAG ingestion is in progress. |
| DELETE IN PROGRESS | RAG deletion is in progress. |
| CREATE FAILED | RAG creation has failed. Delete the RAG and create it again. If creation fails repeatedly, contact Customer Support. |
| UPDATE FAILED | RAG ingestion has failed. Try Synchronize ingestions again. If update fails repeatedly, contact Customer Support. |
| DELETE FAILED | RAG deletion has failed. Try deletion again. If deletion fails repeatedly, contact Customer Support. |
| Status | Description |
|---|---|
| DEPLOYING | API Gateway Basic Stage is deploying. |
| COMPLETE | API Gateway Basic Stage has been successfully deployed and is enabled. |
| FAILURE | API Gateway Basic Stage deployment has failed. |
| Item | Limitation |
|---|---|
| Total file size | 100GB |
| Maximum no. of files | 1,000,000 |
| Category | Supported format | Maximum file size |
|---|---|---|
| Text document | .txt, .text, .md |
3MB |
| Document | .doc, .docx, .pdf |
50MB |
| Spreadsheet | .csv, .xls, .xlsx |
3MB |
| Presentation | .ppt, .pptx |
50MB |
model and messages in the request body, similar to the OpenAI Chat Completion API. For model, include the RAG name.curl -X POST https://{API endpoint address}/rag/v1/query \
-H "Content-Type: application/json" \
-d '{
"model": "{RAG name}",
"messages": [
{
"role": "user",
"content": "{query_text}"
}
]
}'
#!/bin/bash
set -euo pipefail
DEFAULT_URL="https://{API endpoint address}/rag/v1/query"
DEFAULT_MODEL="{RAG name}"
DEFAULT_PROMPT="Describe AI EasyMaker service."
usage() {
cat <<'EOF'
How to use:
<File name> -k <API_KEY> [-u URL] [-m MODEL] [-p PROMPT]
Option:
-k API key (x-nhn-apikey: send to <API_KEY> header)
-u Call URL
-m Model name
-p User prompt
-h Help
Description:
- Call stream=true with an OpenAI-compatible specification,
and sequentially write only the choices[].delta.content of
each chunk delivered via streaming to standard output.
Required tool:
- curl, jq
EOF
}
API_KEY=""
URL="$DEFAULT_URL"
MODEL="$DEFAULT_MODEL"
PROMPT="$DEFAULT_PROMPT"
while getopts ":k:u:m:p:h" opt; do
case "$opt" in
k) API_KEY="$OPTARG" ;;
u) URL="$OPTARG" ;;
m) MODEL="$OPTARG" ;;
p) PROMPT="$OPTARG" ;;
h) usage; exit 0 ;;
\?) echo "Unknown option: -$OPTARG" >&2; usage; exit 2 ;;
:) echo "The option -$OPTARG needs the value." >&2; usage; exit 2 ;;
esac
done
if ! command -v curl >/dev/null 2>&1; then
echo "Error: curl required." >&2
exit 1
fi
if ! command -v jq >/dev/null 2>&1; then
echo "Error: jq required." >&2
exit 1
fi
# Create JSON Payload (OpenAI Chat Completions compatible)
payload="$(jq -n \
--arg model "$MODEL" \
--arg prompt "$PROMPT" \
'{
model: $model,
messages: [ { role: "user", content: $prompt } ],
stream: true
}'
)"
headers=( -H "Content-Type: application/json" )
if [[ -n "$API_KEY" ]]; then
headers+=( -H "x-nhn-apikey: $API_KEY" )
fi
echo "request URL: $URL" >&2
echo "model: $MODEL" >&2
echo "---------------- Start stream ----------------" >&2
# Streaming processing: Extract only delta.content from the data: {json} line
curl -sS -N -X POST "$URL" "${headers[@]}" --data-raw "$payload" \
| while IFS= read -r line; do
[[ -z "$line" ]] && continue
if [[ "$line" == "data: [DONE]"* ]]; then
break
fi
if [[ "$line" == data:* ]]; then
json="${line#data: }"
# There may be multiple choices, so print them all.
# Delta.content may not be present, so it is treated as empty.
while IFS= read -r piece; do
printf "%s" "$piece"
done < <(printf '%s\n' "$json" | jq -r '.choices[]?.delta?.content // empty')
fi
done
echo
echo "---------------- End stream ----------------" >&2
Some features of AI EasyMaker use the user's NHN Cloud Object Storage as input/output storage You must allow read or write access to user’s AI EasyMaker system account in NHN Cloud Object Storage container for running normal features.
Allowing read/write permissions on the AI EasyMaker system account to the user's NHN Cloud Object Storage container is meaning that AI EasyMaker system account can read or write files in accordance with permissions granted to all files in the user's NHN Cloud Object Storage container.
You have to check this information to set up an access policy in User Object Storage only with the required accounts and permissions.
The 'User' take responsibility for all consequences of allowing the user to access Object Storage for an account other than the AI EasyMaker system account during the access policy setting process, and AI EasyMaker is not responsible for it.
Note
According to features, AI EasyMaker accesses, reads or writes to Object Storage as follows.
| Feature | Access Right | Access target |
|---|---|---|
| Training, hyperparameter tuning | Read | Algorithm path entered by user, training input data path |
| Training, hyperparameter tuning | Write | User-entered training output data, checkpoint path |
| Model | Read | Model artifact path entered by user |
| Model evaluation | Read | User-supplied input data path |
| Model evaluation | Write | User-supplied output data path |
| Batch inference | Read | User-supplied input data path |
| Batch inference | Write | User-supplied output data path |
| RAG | read | User-supplied ingestion data path |
To add read/write permissions to AI EasyMaker system account in Object Storage, refer to the following:
Logs and events generated by the AI EasyMaker service can be stored in the NHN Cloud Log & Crash Search service. To store logs in the Log & Crash Search service, you have to enable Log & Crash service and separate usage fee will be charged.
AI EasyMaker service sends logs to Log & Crash Search service in the following defined fields:
Common Log Field
| Name | Description | Valid range |
|---|---|---|
| easymakerAppKey | AI EasyMaker Appkey(AppKey) | - |
| category | Log category | easymaker.training, easymaker.inference |
| logLevel | Log level | INFO, WARNING, ERROR |
| body | Log contents | - |
| logType | Service name provided by log | NHNCloud-AIEasyMaker |
| time | Log Occurrence Time (UTC Time) | - |
Training Log Field
| Name | Description |
|---|---|
| trainingId | AI EasyMaker training ID |
Hyperparameter Tuning Log Field
| Name | Description |
|---|---|
| hyperparameterTuningId | AI EasyMaker hyperparameter tuning ID |
Endpoint Log Field
| Name | Description |
|---|---|
| endpointId | AI EasyMaker Endpoint ID |
| endpointStageId | Endpoint stage ID |
| inferenceId | Inference request own ID |
| action | Action classification (Endpoint.Model) |
| modelName | Model name to be inferred |
Batch Inference Log Field
| Name | Description |
|---|---|
| batchInferenceId | AI EasyMaker batch inference ID |
As shown in the example below, you can use hyperparameter values entered during training creation.

```python
import argparse
model_version = os.environ.get("EM_HP_MODEL_VERSION")
def parse_hyperparameters(): parser = argparse.ArgumentParser()
# Parsing the entered hyper parameter
parser.add_argument("--epochs", type=int, default=500)
parser.add_argument("--batch_size", type=int, default=32)
...
return parser.parse_known_args()
Key Environment Variables
| Environment Variable Name | Description |
|---|---|
| EM_SOURCE_DIR | Absolute path to the folder where the algorithm script entered at the time of training creation is downloaded |
| EM_ENTRY_POINT | Algorithm entry point name entered at training creation |
| EM_DATASET_${Data set name} | Absolute path to the folder where each data set entered at the time of training creation is downloaded |
| EM_DATASETS | Full data set list ( json format) |
| EM_MODEL_DIR | Model storage path |
| EM_CHECKPOINT_INPUT_DIR | Input checkpoint storage path |
| EM_CHECKPOINT_DIR | Output checkpoint storage path |
| EM_HP_${ Upper case converted Hyperparameter key } | Hyperparameter value corresponding to the hyperparameter key |
| EM_HPS | Full Hyperparameter List (in json format) |
| EM_TENSORBOARD_LOG_DIR | TensorBoard log path for checking training results |
| EM_REGION | Current Region Information |
| EM_APPKEY | Appkey of AI EasyMaker service currently in use |
Example code for utilizing environment variables
import os
import tensorflow
dataset_dir = os.environ.get("EM_DATASET_TRAIN")
train_data = read_data(dataset_dir, "train.csv")
model = ... # Implement the model using the input data
model.load_weights(os.environ.get('EM_CHECKPOINT_INPUT_DIR', None))
callbacks = [
tensorflow.keras.callbacks.ModelCheckpoint(filepath=f'{os.environ.get("EM_CHECKPOINT_DIR")}/cp-{{epoch:04d}}.ckpt', save_freq='epoch', period=50),
tensorflow.keras.callbacks.TensorBoard(log_dir=f'{os.environ.get("EM_TENSORBOARD_LOG_DIR")}'),
]
model.fit(..., callbacks)
model_dir = os.environ.get("EM_MODEL_DIR")
model.save(model_dir)
EM_TENSORBOARD_LOG_DIR) when writing the training script.import tensorflow as tf
# Specify the TensorBoard log path
tb_log = tf.keras.callbacks.TensorBoard(log_dir=os.environ.get("EM_TENSORBOARD_LOG_DIR"))
model = ... # model implementation
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=100, batch_size=20, callbacks=[tb_log])
Caution
Metrics older than 120 days will be deleted automatically.
TF_CONFIG required for distributed training is automatically set. For more information, please refer to the Tensorflow guide document.Backends settings are required for distributed training. If distributed training is performed on CPU, set it to gloo, and if distributed training is performed on GPU, set it to nccl. For more information, please refer to the Pytorch guide document.The AI EasyMaker service periodically upgrades the cluster version to provide stable service and new features. When a new cluster version is deployed, you need to move the notebooks and endpoints that are running on the old version of the cluster to the new cluster. Explains how to move new clusters by resource.
On the Notebook list screen, notebooks that need to be moved to the new cluster display a Restart button to the left of their name. Hovering the mouse pointer over theRestart button displays restart instructions and an expiration date.
Restarts take about 25 minutes for the first run, and about 10 minutes for subsequent runs. Failed restarts are automatically reported to the administrator.
On the endpoints list screen, endpoints that need to be moved to the new cluster will have a ! Notice to the left of the name. If you hover over the ! Notice, it displays a version upgrade announcement and an expiration date. Before the expiration, you must follow these instructions to move stages running on the old version cluster to the new version cluster.
Caution
Deleting a stage will shut down the endpoint, preventing API calls. Ensure that the stage is not in service before deleting it.
The default stage is the stage on which the actual service operates. To move the cluster version of the default stage without disrupting the service, use the following guide to move it.
exit code : -9 (pid: {pid})
When you create batch inferences and endpoints in AI EasyMaker, it allocates resources on the selected instance type, less the default usage. The amount of resources you need depends on the demand and complexity of your model, so carefully set the number of pods and resource quota along with the appropriate instance type.
Batch inference allocates resources to each pod by dividing the actual usage by the number of pods. Endpoint cannot allow the quota you enter to exceed the actual usage of your instance, so check your resource usage beforehand. Note that both batch inference and endpoints can fail to create if the allocated resources are less than the minimum usage required by the inference.
The AI EasyMaker service provides endpoints based on the open inference protocol (OIP) specification. For more information about the OIP specification, see OIP Specification.
| Name | Method | API path |
|---|---|---|
| Model List | GET | /{model_name}/v1/models |
| Model Ready | GET | /{model_name}/v1/models/{model_name} |
| Inference | POST | /{model_name}/v1/models/{model_name}/predict |
| Description | POST | /{model_name}/v1/models/{model_name}/explain |
| Server Information | GET | /{model_name}/v2 |
| Server Live | GET | /{model_name}/v2/health/live |
| Server Ready | GET | /{model_name}/v2/health/ready |
| Model Information | GET | /{model_name}/v2/models/{model_name}[/versions/{model_version}] |
| Model Ready | GET | /{model_name}/v2/models/{model_name}[/versions/{model_version}]/ready |
| Inference | POST | /{model_name}/v2/models/{model_name}[/versions/{model_version}]/infer |
| OpenAI generative model inference | POST | /{model_name}/openai/v1/completions |
| OpenAI generative model inference | POST | /{model_name}/openai/v1/chat/completions |
Note
OpenAI generative model inference is used when using a generative model, such as OpenAI's GPT-4o. The inputs required for inference must be entered according to OpenAI's API specification. For more information, see the OpenAI API documentation. For models that support the Completion and Chat Completion APIs provided by AI EasyMaker, see Model endpoint compatibility.
The TensorFlow model serving provided by AI EasyMaker uses the SavedModel (.pb) recommended by TensorFlow. To use checkpoints, save the checkpoint variables directory saved as a SavedModel along with the model directory, which will be used to serve the model. Reference: https://www.tensorflow.org/guide/saved_model
AI EasyMaker serves PyTorch models (.mar) with TorchServe. We recommend using MAR files created using model-archiver, weight files can also be served, but there are files that are required along with the weight files. See the table below and the model-archiver documentation for the required files and detailed descriptions.
| File name | Necessity | Description |
|---|---|---|
| model.py | Required | The model structure file passed in the model-file parameter. |
| handler.py | Required | The file passed to the handler parameter to handle the inference logic. |
| weight files (.pt, .pth, .bin) | Required | The file that stores the weights and structure of the model. |
| requirements.txt | Optional | Files for installing Python packages needed when serving. |
| extra/ | Optional | The files in the directory are passed in the extra-files parameter. |
Note
There are differences in the request format between using TorchServe directly and using AI EasyMaker serving, so take care when writing the handler.py. Refer to the example below to see what values are passed, and implement the handler accordingly.
# Example request
curl --location --request POST '{API Gateway resource path}' \
--header 'Content-Type: application/json' \
--data-raw '{
"instances": [].
[1.0, 2.0],
[3.0, 4.0]
]
}'
class TestHandler(BaseHandler):
# ...
def preprocess(self, data): # Example: data = [[1.0, 2.0], [3.0, 4.0]]
features = []
for row in data:
# Example: row = [1.0, 2.0] content = row
features.append(content)
tensor = torch.tensor(features, dtype=torch.float32).to(self.device)
return tensor
# ...
AI EasyMaker uses mlserver to serve Scikit-learn models (.joblib).
The model-settings.json, which is required when using mlserver directly, is not required when using AI EasyMaker serving.
The Hugging Face model can be served using the Runtime provided by AI EasyMaker, TensorFlow Serving, or TorchServe.
This is a simple way to serve Hugging Face models. Hugging Face Runtime Serving does not support fine-tuning. To serve fine-tuned models, use the TensorFlow/Pytorch Serving method.
Note
Currently, the Hugging Face Runtime does not support the full range of Tasks in Hugging Face.
The following tasks are supported: sequence_classification, token_classification, fill_mask, text_generation, and text2text_generation.
To use unsupported Tasks, use the TensorFlow/Pytorch Serving method.
Note
To serve a gated model, you must enter the token of an account that is allowed access as a model parameter. If you do not enter a token, or if you enter a token from an account that is not allowed, the model deployment fails.
How to serve a Hugging Face model trained with TensorFlow and PyTorch.
Download the Hugging Face model.
You can download it using the AutoTokenizer, AutoConfig, and AutoModel from the transformers library, as shown in the example code below.
from transformers import AutoTokenizer, AutoConfig, AutoModel
model_id = "<model_id>"
revision = "main"
model_dir = f"./models/{model_id}/{revision}"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
model_config = AutoConfig.from_pretrained(model_id, revision=revision)
model = AutoModel.from_config(model_config)
tokenizer.save_pretrained(model_dir)
model.save_pretrained(model_dir)
If the model fails to download, try importing the correct class for your non-AutoModel model and downloading it.
View the Hugging Face model information and generate the files needed to serve it.