Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow various importing format options #2

Open
chadwithuhc opened this issue Dec 7, 2012 · 4 comments
Open

Feature: Allow various importing format options #2

chadwithuhc opened this issue Dec 7, 2012 · 4 comments

Comments

@chadwithuhc
Copy link
Contributor

I would like to see the option to import data from various sources. Here are some I would like to see:

  • JSON (File or URL)
  • RSS (XML file or Feed URL)

I think we need to create a "Type" section in the Profiles page which allows you to choose the type of data you are importing. You will have your current CSV or file based like you have now, but then add these new options. This will provide flexibility to import data from almost any source. Also, it could later on allow you to systematically import data from an API (such as Tumblr or Twitter).

Sample Usage

  1. A user goes to the Profiles page to import a new dataset.
    • At this point we assume a new Stream with the correct fields are set up and ready to be written to
  2. The user will need to enter the file or URL, along with a few optional parameters set on our end:
    • limit -- Max number of posts to import, regardless of how many posts are available
    • unset -- Array of fields to trash or remove in case we do not want them. Ex: post_url
    • mapping -- Array of old key names to map to the new key name. Ex: 'pub-date' => 'post_date'
    • overwrite_posts -- Overwrite existing posts with new data? In case post has changed since last import.
    • frequency -- If we support auto-importing by time intervals...
  3. Maybe a dry run could happen to see how a few items would end up and check for errors
  4. Migrate the posts to the Stream.

Let's discuss the idea of adding this as a new format for importing data. Thoughts?

@adamfairholm
Copy link
Contributor

The way I've seen it done is it shows you a sample row of what you're importing, and then you can map it to existing rows or create new ones. Maybe that could work here.

@chadwithuhc
Copy link
Contributor Author

Now that I've actually got the module working (took me a little to figure out what was right...) I see what is going on.

So yes Adam, that does sound right and that actually is happening now. However, I see some mods we need to make. I've made separate tickets for those. But back to the main idea here...


I have added a whole new "Quick Import" feature which is a wizard similar to the way you import now, but it does not require you to save a Profile. For now, this is acting as a one-time import feature.

Due to this new import method, I have rewritten a lot of the base of this project and separated everything out into reusable methods so we can rebuild the Profiles section to use this because it allows for importing from URLs and different formats. This will be a good start to the future features I hope to have like Automatic Feed Importing based on Profiles, additional source formats available, etc.

I imagine it will take a few more days before I send a PR for this. I will also like to write some docs so you guys can easily get familiar with the new methods and all.

@bergeo-fr Can I get some rights to edit the Wiki and possibly commit directly to the project? I am storing my changes on a separate branch in my forked repo for now, but I would like y'all to get a look at it and see what you think.

@numerogeek
Copy link
Owner

@chadwithuhc hi Chad, you are now listed as a collaborator in this repo.

Geoffrey

@chadwithuhc
Copy link
Contributor Author

Thanks Geoffry.

I submitted a new Branch which has a Quick Import feature. Can you guys check it out when you get a chance? https://github.com/bergeo-fr/streams_import/tree/feature/quick-import -- See some docs on how to use it here: https://github.com/bergeo-fr/streams_import/wiki/Quick-Import

I left your Profiles section alone and sort of built the Quick Import from the ground up. This is cause I wasn't fully sure how your code was working, but we can work to get them on the same code base. I have separated the code into more places now. Here is what was added or changed:

  • Streams_import_m model for all database interactions
  • Streams_import Library update with some methods to handle common tasks such as entry form, mapping form, etc.
  • Import by URL (currently only option in Quick Import, need to add upload support) instead of file upload
  • Added support for JSON or CSV formats
  • Updated Mapping screen to allow Streams Core fields importing (id, created, updated, created_by, ordering_count)
  • Updated Mapping screen to allow "unincluding" certain fields

And we have some additional enhancements I would like to add before merging to the Master:

  • Allow the choice to upload the file or pull by URL -- I have created the streams_import->file_to_array() method which will handle including the file and processing to a DB insert ready array. We just need to add the ability to upload a file, store it, then send the file path to that function.
  • Process source data with Streams Field processing. -- Currently, we just insert the data into the database and do not use the Streams API to insert the entries. This was done because of the addition of the Streams Core Fields and I don't have experience with manually processing the data yet, so wanted to take more time on this
  • Add Security for imported data -- I don't run any extensive security checks to make sure the data is clean, yet
  • Get Profiles to run on same code as Quick Import so we are all on one base
  • Allow Profiles to import by URL in addition to files

Later on down the line I would like to see these features added as well:

  • Allow Profiles to be run automatically in recurring fashion
  • Allow one-to-many mappings for source columns -- Example: save date field to created and updated
  • Allow modifier methods -- It'd be nice to be able to say, "Run this method through strip_tags() to remove all HTML" or even better, make a custom function for processing your data before inserting into database

We can open separate tickets for a specific request when we are ready to do it. I am just using this ticket to track some ideas.

EDIT: Added a link to the docs for Quick Import

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants