ML COMMANDS / QUERIES USING CLI
BangDB has ML natively integrated within it. What it means is that ML is part of the system and it does perform work both in implicit and explicit manner. This means that within BangDB we have ML related support for training, testing, versioning, storing and deploying the models for prediction in continuous or other manner.
While BangDB implements and support certain algorithm implement in c/c+ natively, it also allows users to bring their model, framework or code to run as part of the system.
Users may leverage some of the frameworks like Tensorflow, PyTorch etc. as they like or require. Since one of the major problems with ML is dealing with large files, either training/test files or models themselves, therefore BangDB uses BRS to help users with these.
Let’s now figure out the commands that cli supports for ML
Train, predict, deploy models using cli train model model_name train model from model_name show models show models where schema = "myschema" show status where schema = "myschema" and model = "mymodel" select treq from bangdb_ml where schema = "myschema" and model = "mymodel" select treq from bangdb_ml where schema = "myschema" delete treq from bangdb_ml where schema = "myschema" and model = "mymodel" update bangdb_ml set status = 25 where schema = "myschema" and model = "mymodel" drop model mymodel where schema = "myschema" pred model model_name
Train model
BangDB trains models based on the training instructions (metadata) defined in the json format. We can write the metadata in a text editor and save as file and use the file directly to train or we can start a workflow here on the cli which will eventually create the metadata and train the model. Here is the training metadata format which BangDB leverages for training models
—————–
schema-name : name of the schema. we must associate a schema to a model, this ensure uniqueness of the model and also allows it to be used on a stream of data
model_name : name of the model
algo_type: ” Classification (1) | Regression (2) | Lin-regression/Classification (3) | Kmeans (4) | Custom (5) | IE – ontology (6) | IE – NER (7) | IE – Sentiment (8) | IE – KB (9) | DL – resnet (10) | DL – lenet (11) | DL – face detection (12) | DL – shape detection (13) | SL – object detection (14) [ 10, 11, 12, 13, 14 are not supported in this version, take enterprise for these ]
scale: 1 if you wish to scale the data else 0
tune_param: do you wish to tune the parameters of the algo? select 1 else 0
attr_type: 1 for numerical, 2 for string and 3 for hybrid
algo_param : list of algo params, specific to the selected algo_type
OPTIONAL
—————-
custom_format: This is used to further process the data before training, typically when data needs to be rolled up for training.
name: name of the attribute, that will be created
fields: It need 3 fields info; ts(timestamp) field, quantity which needs to be rolled up, and entityid ( groupby entity)
aggr_type: type of aggregation for “quantity” while rolling up [ 1 for sum … ]
gran: roll up granularity in seconds
udf: user defined function that could be used before training. It’s a python file supplied by user
name: name of the udf
udf_logic : there are set of logic that user may define, select that, denoted by num. for ex; 1
bucket_name: name of the bucket which has the udf file
Let’s take a look at a schema for training a classification mode using svm
training request : { "algo_param":{ "termination_criteria":0.1, "degree":0, "svm_type":2, "kernel_type":2, "gamma":0.001, "shrinking":0 }, "attr_type":1, "tune_params":1, "scale":1, "schema-name":"myschema", "training_details":{ "file_size_mb":1, "input_format":"SVM", "expected_format":"SVM", "train_speed":2, "training_source":"svmguide1", "training_source_type":1 }, "attr_list":[ { "name":"a", "position":0 }, { "position":1, "name":"b" }, { "position":2, "name":"c" }, { "position":3, "name":"d" }, { "name":"e", "position":4 } ], "algo_type":"SVM", "model_name":"model1" }
Let’s train this model using the workflow first
train model model1
what's the name of the schema for which you wish to train the model?: myschema do you wish to read earlier saved ml schema for editing/adding? [ yes | no ]:Since we are creating new model and we don’t have metadata saved on the disk, hence we will select “no” (or enter) and move on
Now it lists all the natively supported algo and also “Custom (5)” option if we wish to use other framework etc.
We will pick Classification (1)
BangDB supports following algorithm, pls select from these Classification (1) | Regression (2) | Lin-regression/Classification (3) | Kmeans (4) | Custom (5) | IE - ontology (6) | IE - NER (7) | IE - Sentiment (8) | IE - KB (9) | DL - resnet (10) | DL - lenet (11) | DL - face detection (12) | DL - shape detection (13) | SL - object detection (14) what's the algo would you like to use (or Enter for default (1)): 1Based on algo selection, it asks for certain info on the parameters etc.
svm type [ C_SVC (0) | NU_SVC (1) | ONE_CLASS (2) ] (press enter for default (0)): 2 kernel type [ LINEAR (0) | POLY (1) | RBF (2) | SIGMOID (3) ] (press enter for default (0)): 2 degree (press enter for default (3): enter gamma (or press enter for default (0.001)): enable shrinking? [ yes | no ]: what's the stopping criteria (eps) (or press enter for default (0.001)): 0.1 what's the input (training data) source? [ local file (1) | file on BRS (2) | stream (3) ] (press enter for default (1)): 1 enter the file name for upload (along with full path): trainfiles/svmguide1 what is the input data format for the train data [ LIBSVM (0) | CSV (1) | JSON (3) ] (press Enter for default 1): 0 what's the training speed you wish to select [ Very fast (1) | fast (2) | medium (3) | slow (4) | very slow (5) ] (or Enter for default (1)): 2 what's the attribute type [ NUM (1) | STRING (2) | HYBRID (3) ] (press enter for default (1): 1 do you wish to scale the data? [ yes | no ]: yes do you wish to tune the params? [ yes | no ]: yesFinally we can also do attribute mapping here.
This is useful when we have data format and the format needed by the algo are different,
such that db could do the transformation accordingly before testing.
It also helps in training and prediction on stream as subset of event fields could be used for training and prediction
we need to do the mapping so it can be used on streams later This means we need to provide attr name and its position in the training file attr name: a attr position: 0 do you wish to add more attributes? [ yes | no ]: yes attr name: b attr position: 1 do you wish to add more attributes? [ yes | no ]: yes attr name: c attr position: 2 do you wish to add more attributes? [ yes | no ]: yes attr name: d attr position: 3 do you wish to add more attributes? [ yes | no ]: yes attr name: e attr position: 4 do you wish to add more attributes? [ yes | no ]: do you wish to add external udf to do some computations before the training? [ yes | no ]:Once we enter “yes”, model starts training.
To know the status of the training, we should use either of the following:
show models
+---------------+----------+----+------------+-----------+------------------------+------------------------+ |key|model name|algo|train status|schema name|train start time|train end time| +---------------+----------+----+------------+-----------+------------------------+------------------------+ |myschema:model1|model1 | SVM|passed |myschema |Wed Feb 3 13:44:47 2021|Wed Feb 3 13:44:59 2021| +---------------+----------+----+------------+-----------+------------------------+------------------------+The above will show details for all models. To know specifically for a model;
show status where schema = "myschema" and model = "model1"
{ "schema-name":"myschema", "model_name":"model1", "train_req_state":25 }
Now let’s do test prediction here
Predict for a test event (single data)
pred model model1
what's the name of the schema for which mode was trained?: myschema do you wish to see the train request? [ yes | no ]: no model algo type is [ SVM ] it needs [ NUM ] data type with [ LIBSVM ] input data format what is the input data format for the given pred file [ LIBSVM (0) | CSV (1) | JSON (3) ] (press Enter for default 0): 0 do you wish to provide attribute list? [ yes | no ]: no do you wish to consider the target (are you also supplying target value?) [ yes | no ]: no do you wish to pred for file? or single event? [ yes (file) | no (single event) ]: no enter the test data: 1:2.617300e+01 2:5.886700e+01 3:-1.894697e-01 4:1.251225e+02 pred request = {"input_format":"SVM","expected_format":"SVM","schema-name":"myschema","model_name":"model1","algo_type":"SVM","attr_type":1,"consider_target":0,"data_type":2,"data":"1:2.617300e+01 2:5.886700e+01 3:-1.894697e-01 4:1.251225e+02"} {"predict_labels":1,"user_pred_accuracy":100,"errorcode":0} successWe selected libsvm format of the event “1:2.617300e+01 2:5.886700e+01 3:-1.894697e-01 4:1.251225e+02” hence there was no conversion
Let’s select csv file and ask db to do the conversion
pred model model1 what's the name of the schema for which mode was trained?: myschema do you wish to see the train request? [ yes | no ]: model algo type is [ SVM ] it needs [ NUM ] data type with [ LIBSVM ] input data format what is the input data format for the given pred file [ LIBSVM (0) | CSV (1) | JSON (3) ] (press Enter for default 0): 1 what is the separator (SEP) for the csv file? (press Enter for default ',' (comma) else type it): do you wish to provide attribute list? [ yes | no ]: do you wish to consider the target (are you also supplying target value?) [ yes | no ]: do you wish to pred for file? or single event? [ yes (file) | no (single event) ]: enter the test data: 26,58,-0.02,125 pred request = {"input_format":"CSV","SEP":",","expected_format":"SVM","schema-name":"myschema","model_name":"model1","algo_type":"SVM","attr_type":1,"consider_target":0,"data_type":2,"data":"26,58,-0.02,125"} {"predict_labels":1,"user_pred_accuracy":0,"errorcode":0} successhere we select 1 for input data format and gave the event in csv; “26,58,-0.02,125” Now pred using a test file
pred model model1
what's the name of the schema for which mode was trained?: myschema do you wish to see the train request? [ yes | no ]: model algo type is [ SVM ] it needs [ NUM ] data type with [ LIBSVM ] input data format what is the input data format for the given pred file [ LIBSVM (0) | CSV (1) | JSON (3) ] (press Enter for default 0): do you wish to provide attribute list? [ yes | no ]: do you wish to consider the target (are you also supplying target value?) [ yes | no ]: yes do you wish to pred for file? or single event? [ yes (file) | no (single event) ]: yes do you wish to upload the file? [ yes | no ]: yes enter the test file name for upload (along with full path): trainfiles/svmguide1.t pred request = {"input_format":"SVM","expected_format":"SVM","schema-name":"myschema","model_name":"model1","algo_type":"SVM","attr_type":1,"consider_target":1,"data_type":1,"data":"svmguide1.t"} {"pred_file_out":"model1__myschema__svmguide1.t.predict","errorcode":0} do you wish to download the test file? [ yes | no ]: yes test file [ model1__myschema__svmguide1.t.predict ] download successful, it's in the /tmp folder successNote here that we have the option of downloading the file here itself (pred file) or you may take later from BRS for the key specified here (ex; “model1__myschema__svmguide1.t.predict”)
There are more commands here which are self explanatory, as defined in the page upfront
Check out few real world examples and try them out