-
Notifications
You must be signed in to change notification settings - Fork 1
Hello World and More
-
Load json
you can load json by your own method, or use the function we write for Facebook json to handling mojibake.
from fbjson2table.func_lib import parse_fb_json json_content = parse_fb_json($PATH_OF_JSON)
-
Feed it into "TempDFs", and take a look of "TempDFs.df_list" and "TempDFs.table_name_list",
from tabulate import tabulate from fbjson2table.table_class import TempDFs temp_dfs = TempDFs(json_content) for df, table_name in zip(temp_dfs.df_list, temp_dfs.table_name_list): print(table_name, ':') print(tabulate(df, headers='keys', tablefmt='psql'), '\n')
here is example of json_content
here is example of TempDFs.df_list and TempDFs.table_name_list
Every df has its own name, the default name of the root DataFrame is "temp", and the names of the sub-df are called "${NAME_OF_ROOT_DF}__DICT_KEY".
After the json is flattened, each layer will have its own id. The total numbers of layer ids should equal to the "depth(peeling)" of the original json. The id of first depth is always called "id_0", and the following id is called "id_${DICT_KEY_DEPTH}" such as "id_attachment_1".
With the ids, we can do the "join" operation. For example, if we want to put "uri" of "media" and "timestamp" of posts in same table, the code will like:
top_df = temp_posts_dfs[0].set_index("id_0", drop=False) append_df = temp_posts_df[4].set_index("id_0", drop=False) wanted_df = top_df.join(append_df) # What we want
If you are too lazy to find where is the data you want, and you are sure that the data is one-to-one relationship with "top_df", you can use "merge_one_to_one_sub_df."
For example:
one_to_one_df = temp_dfs.merge_one_to_one_sub_df( temp_dfs.df_list, temp_dfs.table_name_list, temp_dfs.id_column_names_list, start_peeling=0) # start_peeling is the index of df we want to set as "top_df" in df_list
note: in the "one_to_one_df", all column names of sub dfs will concat its depth dict key as prefix. For example, "id_media_3" => "media_id_media_3".
- Create a one-to-one DataFrame
from fbjson2table.func_lib import parse_fb_json
from fbjson2table.table_class import TempDFs
json_content = parse_fb_json($PATH_OF_JSON)
temp_dfs = TempDFs(json_content)
one_to_one_df, _ = temp_dfs.temp_to_wanted_df(
wanted_columns=[]
)
Take a look of one_to_one_df, and determine which columns we want.
print(one_to_one_df.columns)
- From the full table, get only the wanted columns
# You will need to pre-define the LIST_OF_WANTED_COLUMNS
wanted_columns = LIST_OF_WANTED_COLUMNS
df, top_id = temp_dfs.temp_to_wanted_df(
wanted_columns=wanted_columns
)
The "df" is what we can use to analyze; simple, easy, and with only columns we need.