Time Series ndim error

#10
by younge1 - opened

I'm attempting to modify the TimeSeriesTransformerForPrediction to accept my own multi variable dataset. My dataset contains ~70 runs. Each run contains a time series and three static environment variables that remain constant throughout the run and describe the conditions of the run. For example, one run may contain a time series, and the variables 1,250,5. Each time series contains values sampled every minute over 19 hours (1140 Samples).
I have taken the following steps to prepare the data following following Kashif's time series dataset example:

  1. Create a list of dictionaries with each list item containing key value features for target, start, and feat_static_cat. For example, this is an item from the train list
    {'start': Timestamp('1970-01-01 00:00:00'), 'target': array([ 775.457207, 775.457207, 785.056306, ..., 1231.042793,
    1239.531532, 1239.598348]), 'feat_static_cat': [1, 250, 5]}
  2. Define a feature schema and
features  = Features(
    {    
        "start": Value("timestamp[s]"),
        "target": Sequence(Value("float32")),
        "feat_static_cat": Sequence(Value("uint64")),
    }
)
  1. Create training and test datasets from feature schema and list
train_dataset= Dataset.from_list(trainList,features)
test_dataset= Dataset.from_list(testList,features)

After preparing the dataset, I computed the cardinality of the static features

card= [len(depots.categories), len(tanks.categories), len(ships.categories)]

to get a list [5,3,5]

I then declare my transformer config as follows:

config = TimeSeriesTransformerConfig(
    prediction_length=prediction_length,
    context_length=30,
   lags_sequence=[1, 2, 3, 4,5,6,7],
    num_time_features=len(time_features) + 1, # we'll add 2 time features ("month of year" and "age", see further)
    num_static_categorical_features=3, # depot, shipnum, and tank size
    cardinality=card, 
    input_size=3,
    embedding_dimension=[1,1,1], 
    encoder_layers=4, 
    decoder_layers=4,
)

I then create a data loader and batch iterator:

train_dataloader = create_train_dataloader(
    config=config, 
    freq=freq, 
    data=train_dataset, 
    batch_size=256,
    num_batches_per_epoch=100,
)

batch = next(iter(train_dataloader))

When I run the train data loader, I get the following error:

 File "/usr/local/lib/python3.9/site-packages/gluonts/exceptions.py", line 95, in assert_gluonts
    raise exception_class(message.format(*args, **kwargs))
gluonts.exceptions.GluonTSDataError: Input for field "target" does not have the requireddimension (field: target, ndim observed: 1, expected ndim: 2)

I suspect the error is comping from when the data loader converts the target to an Numpy array.

            AsNumpyArray(
                field=FieldName.TARGET,
                # in the following line, we add 1 for the time dimension
                expected_ndim=1 if config.input_size==1 else 2,
            ),

When I change the data loader to expect n_dim=1, I get the following error

Batch Information:
static_categorical_features torch.Size([256, 3]) torch.LongTensor
static_real_features torch.Size([256, 1]) torch.FloatTensor
past_time_features torch.Size([256, 37, 6]) torch.FloatTensor
past_values torch.Size([256, 37]) torch.FloatTensor
past_observed_mask torch.Size([256, 37]) torch.FloatTensor
future_time_features torch.Size([256, 60, 6]) torch.FloatTensor
future_values torch.Size([256, 60]) torch.FloatTensor
future_observed_mask torch.Size([256, 60]) torch.FloatTensor
Error:
  File "/usr/local/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Is there something I am not doing when adding static features to my time series?

Thanks

Hugging Face org

thanks @younge1 for the question, we have added a multivariate tutorial and corresponding notebook. Can you kindly have a look here: https://huggingface.co/blog/informer

In the mean time, I will have a look and try to figure out what could be the issue!

Hi Kashif,

Thanks for the response and linking the multivariate blog post. I'll review that now.

Hugging Face org
edited Mar 10, 2023

so the error seems to be that the cardinality of the embeddings is not correct, meaning the categorical covariate has some integer id which is > the cardinality you specified... can you check.

Also, note that the input_size is the size of the multivariate vector in the past_values and future_values but you seem to have univariate input so set that to 1?

Another thing, the categorical values will range from 0, ..., cardinality -1 for the respective categories.

Hi Kashif,

I misread the docs and assumed that input_size referred to variables other than my single time series. I changed input_size back to 1. I currently get the cardinality of my static features using the following function:

def getTrainSetCardinality(trainList):
#Depots
depotCategoryList= []
for df in trainList:
depotCategoryList.append(df['feat_static_cat'][0])
depots = pd.Categorical(depotCategoryList)
#tanks
tankCategoryList= []
for df in trainList:
tankCategoryList.append(df['feat_static_cat'][1])
tanks = pd.Categorical(tankCategoryList)
#ships
shipCategoryList= []
for df in trainList:
shipCategoryList.append(df['feat_static_cat'][2])
#create a categorical feature for the ship numbers
ships = pd.Categorical(shipCategoryList)

#determine the cardinality of the static features
cardinality = [len(depots.categories), len(tanks.categories), len(ships.categories)]
return cardinality
  1. I run this function on my list prior to calling "train_dataset= Dataset.from_list(trainList,features)". Is it possible the cardinality list order of my static_cat features may have changed from using "Dataset.from_list"?
  2. I attached a debug output of my train_dataset prior to being fed into my data loaded. Is there an easy function/way to verify the cardinality of my static features when for objects of type "Dataset"?

Thanks again for all your help!

Screen Shot 2023-03-10 at 3.18.50 PM.png

Hugging Face org

so one thing to note the cardinality of the categorical features is normally calculated as the number of unique categories... so something like depots.categories.nunique()

I modified my getTrainingSetCardinality function to now return

"cardinality = [depots.categories.nunique(), tanks.categories.nunique(), ships.categories.nunique()]"
vice
"cardinality = [len(depots.categories), len(tanks.categories), len(ships.categories)]"
And still get the same cardinality. Do the actual category values have to be formatted in a unique way?

I have attached a debug output of my getTrainingSetCardinality function
Screen Shot 2023-03-10 at 3.52.57 PM.png

Hugging Face org

almost... as i mentioned above the values of the ids of the categorical values need to be integers starting from 0, ..., cardinality-1 so in your example you have the encodings be large numbers e.g. 250 whereas it should be 0, 1, 2,...

hopefully, that helps!

Oh that makes sense! I'll make the changes now. Thanks again for your help.

Sign up or log in to comment