Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Don't save Task #258

Open
vaaaaanquish opened this issue Nov 5, 2021 · 2 comments
Open

[Feature Request] Don't save Task #258

vaaaaanquish opened this issue Nov 5, 2021 · 2 comments
Labels

Comments

@vaaaaanquish
Copy link
Contributor

I'll create a task like Function that won't be saved.

for example

import gokart

class Pipeline(gokart.TaskOnKart):
    def requires(self):
        data = LoadData()
        features = [MakeFeatureA(data=data), MakeFeatureB(data=data), MakeFeatureC(data=data)]

        # `Flatten` is a Task, but we don't want to dump result because the data will be too large :(
        feature = Flatten(features=features, axis=1)

        model = TrainModel(feature=feature)
        return model
@vaaaaanquish
Copy link
Contributor Author

I'm thinking about making gokart.Function

import pandas
import gokart

class FlattenFunction(gokart.Function):
    def process(self):
        df_list = self.load()
        df = pd.concat(df_list, axis=1)
        return df

Function's result will not be dumped to TASK_WORKSPACE, but will be temporarily stored in a tmp file.
In the second runs, There is no file, but it will be skipped judgment for whether the task has been executed.

@vaaaaanquish
Copy link
Contributor Author

This is still just idea. Plz comment :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants