rio - A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.
Last updated 2 months ago
csvcsvydatadata-scienceexcelioriosasspssstata
17.17 score 600 stars 67 packages 7.2k scripts 60k downloadsrtoot - Collecting and Analyzing Mastodon Data
An implementation of calls designed to collect and organize Mastodon data via its Application Program Interfaces (API), which can be found at the following URL: <https://docs.joinmastodon.org/>.
Last updated 16 days ago
mastodonmastodon-api
8.80 score 104 stars 68 scripts 955 downloadsoolong - Create Validation Tests for Automated Content Analysis
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Last updated 1 months ago
textanalysistopicmodelingvalidation
7.51 score 54 stars 22 scripts 255 downloadsminty - Minimal Type Guesser
Port the type guesser from 'readr' (so-called 'readr' first edition parsing engine, now superseded by 'vroom').
Last updated 14 days ago
6.92 score 5 stars 22 packages 5 scripts 7.2k downloadsrang - Reconstructing Reproducible R Computational Environments
Resolve the dependency graph of R packages at a specific time point based on the information from various 'R-hub' web services <https://blog.r-hub.io/>. The dependency graph can then be used to reconstruct the R computational environment with 'Rocker' <https://rocker-project.org>.
Last updated 11 months ago
reproducibilityreproducible-research
6.61 score 79 stars 13 scripts 209 downloadsadaR - A Fast 'WHATWG' Compliant URL Parser
A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written in modern 'C++'. Also contains auxiliary functions such as a public suffix extractor.
Last updated 2 months ago
url-parser
6.41 score 26 stars 2 packages 11 scripts 498 downloadswebtrackR - Preprocessing and Analyzing Web Tracking Data
Data structures and methods to work with web tracking data. The functions cover data preprocessing steps, enriching web tracking data with external information and methods for the analysis of digital behavior as used in several academic papers (e.g., Clemm von Hohenberg et al., 2023 <doi:10.17605/OSF.IO/M3U9P>; Stier et al., 2022 <doi:10.1017/S0003055421001222>).
Last updated 2 months ago
webtracking
5.91 score 9 stars 8 scripts 565 downloadsgrafzahl - Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'
Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <https://huggingface.co/>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.
Last updated 8 months ago
5.61 score 41 stars 2 scripts 183 downloadssweater - Speedy Word Embedding Association Test and Extras Using R
Conduct various tests for evaluating implicit biases in word embeddings: Word Embedding Association Test (Caliskan et al., 2017), <doi:10.1126/science.aal4230>, Relative Norm Distance (Garg et al., 2018), <doi:10.1073/pnas.1720347115>, Mean Average Cosine Similarity (Mazini et al., 2019) <arXiv:1904.04047>, SemAxis (An et al., 2018) <arXiv:1806.05521>, Relative Negative Sentiment Bias (Sweeney & Najafian, 2019) <doi:10.18653/v1/P19-1162>, and Embedding Coherence Test (Dev & Phillips, 2019) <arXiv:1901.07656>.
Last updated 5 months ago
bias-detectiontextanalysiswordembedding
4.28 score 27 stars 14 scripts 475 downloadswebbotparseR - Parse html files containing search engine results
Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.
Last updated 1 months ago
browser-extensionsearch-engine
3.20 score 8 stars 5 scripts